Training error

koulalimedamine · March 18, 2019, 11:33pm

Hi dear All,

I keep getting the following error:

Preprocessing [’…/test.csv’]
Preprocessing done
[scorer.cpp:76] FATAL: “(access(filename, 4)) == (0)” check failed. Invalid language model path

my command line is the following:

python3 DeepSpeech.py --dev_files …/dev.csv --test_files …/test.csv --train_files …/train.csv --train_batch_size 12 --dev_batch_size 12 --test_batch_size 12 --epoch 150 --display_step 1 --validation_step 1 --dropout_rate 0.30 --default_stddev 0.046875 --learning_rate 0.0001 --log_level 0 --checkpoint_dir . --export_dir .

where . is DeepSpeech folder, I have done git lfs install inside this folder (DeepSpeech)
Also I have clone the repository using:
git lfs clone https://github.com/mozilla/DeepSpeech.git
Any solution

lissyx · March 19, 2019, 9:54am

Have you checked that the LM has been properly checked out in data/lm ? Are they properly readable ? Can you share more details on your system ?

sehar_capricon · November 20, 2019, 4:01am

(venv) sehar@sehar-HP-Z220-CMT-Workstation:~/DeepSpeech$ ./run.sh
/home/sehar/DeepSpeech/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint8 = np.dtype([(“qint8”, np.int8, 1)])
/home/sehar/DeepSpeech/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint8 = np.dtype([(“quint8”, np.uint8, 1)])
/home/sehar/DeepSpeech/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint16 = np.dtype([(“qint16”, np.int16, 1)])
/home/sehar/DeepSpeech/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint16 = np.dtype([(“quint16”, np.uint16, 1)])
/home/sehar/DeepSpeech/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint32 = np.dtype([(“qint32”, np.int32, 1)])
/home/sehar/DeepSpeech/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
np_resource = np.dtype([(“resource”, np.ubyte, 1)])
/home/sehar/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint8 = np.dtype([(“qint8”, np.int8, 1)])
/home/sehar/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint8 = np.dtype([(“quint8”, np.uint8, 1)])
/home/sehar/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint16 = np.dtype([(“qint16”, np.int16, 1)])
/home/sehar/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint16 = np.dtype([(“quint16”, np.uint16, 1)])
/home/sehar/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint32 = np.dtype([(“qint32”, np.int32, 1)])
/home/sehar/venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
np_resource = np.dtype([(“resource”, np.ubyte, 1)])
WARNING:tensorflow:From /home/sehar/DeepSpeech/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It’s easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means tf.py_functions can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.

W1119 09:57:11.723611 140548025214784 deprecation.py:323] From /home/sehar/DeepSpeech/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It’s easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means tf.py_functions can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.

WARNING:tensorflow:From /home/sehar/DeepSpeech/tensorflow/python/data/ops/iterator_ops.py:348: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.data.get_output_types(iterator).
W1119 09:57:11.831957 140548025214784 deprecation.py:323] From /home/sehar/DeepSpeech/tensorflow/python/data/ops/iterator_ops.py:348: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.data.get_output_types(iterator).
WARNING:tensorflow:From /home/sehar/DeepSpeech/tensorflow/python/data/ops/iterator_ops.py:349: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.data.get_output_shapes(iterator).
W1119 09:57:11.832188 140548025214784 deprecation.py:323] From /home/sehar/DeepSpeech/tensorflow/python/data/ops/iterator_ops.py:349: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.data.get_output_shapes(iterator).
WARNING:tensorflow:From /home/sehar/DeepSpeech/tensorflow/python/data/ops/iterator_ops.py:351: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.data.get_output_classes(iterator).
W1119 09:57:11.832332 140548025214784 deprecation.py:323] From /home/sehar/DeepSpeech/tensorflow/python/data/ops/iterator_ops.py:351: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.data.get_output_classes(iterator).
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1119 09:57:15.496555 140548025214784 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /home/sehar/DeepSpeech/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W1119 09:57:15.500564 140548025214784 deprecation.py:506] From /home/sehar/DeepSpeech/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From DeepSpeech.py:232: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1119 09:57:16.281688 140548025214784 deprecation.py:323] From DeepSpeech.py:232: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/sehar/DeepSpeech/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W1119 09:57:16.934401 140548025214784 deprecation.py:323] From /home/sehar/DeepSpeech/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /home/sehar/DeepSpeech/check/train-22
I1119 09:57:16.947213 140548025214784 saver.py:1280] Restoring parameters from /home/sehar/DeepSpeech/check/train-22
2019-11-19 09:57:17.189442: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=–tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=–xla_hlo_profile.
I Restored variables from most recent checkpoint at /home/sehar/DeepSpeech/check/train-22, step 22
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:38:37 | Steps: 2 | Loss: 6997.156494 WARNING:tensorflow:From /home/sehar/DeepSpeech/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W1119 10:36:04.299221 140548025214784 deprecation.py:323] From /home/sehar/DeepSpeech/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
Epoch 0 | Training | Elapsed Time: 15:45:26 | Steps: 23 | Loss: 9715.056917 2019-11-20 02:19:15.018282: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at constant_op.cc:278 : Resource exhausted: OOM when allocating tensor with shape[121285,1,2048] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
2019-11-20 02:19:15.018283: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[121285,1,2048] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
Traceback (most recent call last):
File “/home/sehar/DeepSpeech/tensorflow/python/client/session.py”, line 1356, in _do_call
return fn(*args)
File “/home/sehar/DeepSpeech/tensorflow/python/client/session.py”, line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/home/sehar/DeepSpeech/tensorflow/python/client/session.py”, line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[121285,1,2048] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
[[{{node tower_0/gradients/zeros_like_2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 906, in
absl.app.run(main)
File “/home/sehar/venv/lib/python3.6/site-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/home/sehar/venv/lib/python3.6/site-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “DeepSpeech.py”, line 890, in main
train()
File “DeepSpeech.py”, line 608, in train
train_loss, _ = run_set(‘train’, epoch, train_init_op)
File “DeepSpeech.py”, line 576, in run_set
feed_dict=feed_dict)
File “/home/sehar/DeepSpeech/tensorflow/python/client/session.py”, line 950, in run
run_metadata_ptr)
File “/home/sehar/DeepSpeech/tensorflow/python/client/session.py”, line 1173, in _run
feed_dict_tensor, options, run_metadata)
File “/home/sehar/DeepSpeech/tensorflow/python/client/session.py”, line 1350, in _do_run
run_metadata)
File “/home/sehar/DeepSpeech/tensorflow/python/client/session.py”, line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[121285,1,2048] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
[[node tower_0/gradients/zeros_like_2 (defined at DeepSpeech.py:308) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Errors may have originated from an input operation.
Input Source operations connected to node tower_0/gradients/zeros_like_2:
tower_0/cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/BlockLSTM (defined at /tmp/tmpqh1i6zit.py:182)

Original stack trace for ‘tower_0/gradients/zeros_like_2’:
File “DeepSpeech.py”, line 906, in
absl.app.run(main)
File “/home/sehar/venv/lib/python3.6/site-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/home/sehar/venv/lib/python3.6/site-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “DeepSpeech.py”, line 890, in main
train()
File “DeepSpeech.py”, line 449, in train
gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
File “DeepSpeech.py”, line 308, in get_tower_results
gradients = optimizer.compute_gradients(avg_loss)
File “/home/sehar/DeepSpeech/tensorflow/python/training/optimizer.py”, line 512, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File “/home/sehar/DeepSpeech/tensorflow/python/ops/gradients_impl.py”, line 158, in gradients
unconnected_gradients)
File “/home/sehar/DeepSpeech/tensorflow/python/ops/gradients_util.py”, line 722, in _GradientsHelper
out_grads[i] = control_flow_ops.ZerosLikeOutsideLoop(op, i)
File “/home/sehar/DeepSpeech/tensorflow/python/ops/control_flow_ops.py”, line 1338, in ZerosLikeOutsideLoop
return array_ops.zeros_like(val, optimize=False)
File “/home/sehar/DeepSpeech/tensorflow/python/util/dispatch.py”, line 180, in wrapper
return target(*args, **kwargs)
File “/home/sehar/DeepSpeech/tensorflow/python/ops/array_ops.py”, line 1916, in zeros_like
return zeros_like_impl(tensor, dtype, name, optimize)
File “/home/sehar/DeepSpeech/tensorflow/python/ops/array_ops.py”, line 1976, in zeros_like_impl
return gen_array_ops.zeros_like(tensor, name=name)
File “/home/sehar/DeepSpeech/tensorflow/python/ops/gen_array_ops.py”, line 11961, in zeros_like
“ZerosLike”, x=x, name=name)
File “/home/sehar/DeepSpeech/tensorflow/python/framework/op_def_library.py”, line 788, in _apply_op_helper
op_def=op_def)
File “/home/sehar/DeepSpeech/tensorflow/python/util/deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “/home/sehar/DeepSpeech/tensorflow/python/framework/ops.py”, line 3616, in create_op
op_def=op_def)
File “/home/sehar/DeepSpeech/tensorflow/python/framework/ops.py”, line 2005, in init
self._traceback = tf_stack.extract_stack()

sehar_capricon · November 20, 2019, 4:03am

i have done training before and it was a success now i have started training again and it is giving me this above error due to which training was not done completely

lissyx · November 20, 2019, 8:13am

@sehar_capricon Please, share your console output using proper code formatting, especially when it is this long. I can’t read this right now, so I can’t help you.

lissyx · November 20, 2019, 8:14am

I see OOM, please check your batch size. You may want to reduce it.

sehar_capricon · November 20, 2019, 8:41am

I have not used batch size

sehar_capricon · November 20, 2019, 8:42am

#!/bin/sh
python -u “DeepSpeech.py”
–train_files “/home/sehar/urdu/cv-valid-train.csv”
–dev_files “/home/sehar/urdu/cv-valid-dev.csv”
–test_files “/home/sehar/urdu/cv-valid-test.csv”
–alphabet_config_path “/home/sehar/urdu-models/alphabet.txt”
–lm_binary_path “/home/sehar/urdu-models/lm.binary”
–lm_trie_path “/home/sehar/urdu-models/trie”
–learning_rate 0.000025
–dropout_rate 0
–log_level 1
–noearly_stop
–epochs 100
–max_to_keep 1
–checkpoint_dir “/home/sehar/DeepSpeech/check”
–export_dir “/home/sehar/urdu-models”
this was my run file for training

lissyx · November 20, 2019, 9:08am

I will not be able to help your further if you don’t share console output in a readable way.

sehar_capricon · November 21, 2019, 4:22am

thank you for your prompt response. i have not used batch size for training my own model

lissyx · November 21, 2019, 7:39am

That’s nice of thanking me, but you have not fixed your console output, I still cannot reliably and easily read that. If you don’t, I can’t help you.

You are running out of memory, somewhere. That’s all I can see, you need to think on your side.