I am trying to fine-tune using the checkpoint from the latest release and using my own dataset. My wav file is ~22MB, and I’m assuming it’s not extraordinarily large.
Here are my machine specs:
1x Intel Core i7-6850K (6 cores, 3.6 GHz), 96 GB RAM, 11 GB GTX 1080 Ti, 12 GB Titan Xp
I assume that the training should go well with this configuration however I keep getting ResourceExhausted error and I am confused why.
Here’s my training script:
python -u /auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py \
--train_files /auto/k1/shahdloo/Projs/stories-nn/data/stories/train.csv \
--dev_files /auto/k1/shahdloo/Projs/stories-nn/data/stories/train.csv \
--test_files /auto/k1/shahdloo/Projs/stories-nn/data/stories/train.csv \
--n_hidden 2048 \
--train_batch_size 1 \
--dev_batch_size 1 \
--test_batch_size 1 \
--epoch 3 \
--limit_train 1 \
--limit_dev 1 \
--log_level 0 \
--limit_test 1 \
--learning_rate 0.0001 \
--dropout_rate 0.2367 \
--default_stddev 0.046875 \
--checkpoint_step 1 \
--validation_step 1 \
--wer_log_pattern "GLOBAL LOG: logwer('${COMPUTE_ID}', '%s', '%s', %f)" \
--export_dir /auto/data/shahdloo/DeepSpeech/model_export/ \
--checkpoint_dir /auto/data/shahdloo/DeepSpeech/checkpoint/ \
--decoder_library_path /auto/k1/shahdloo/Projs/stories-nn/native_client/libctc_decoder_with_kenlm.so \
--alphabet_config_path /auto/k1/shahdloo/Projs/stories-nn/data/alphabet.txt \
--lm_binary_path /auto/k1/shahdloo/Projs/stories-nn/models/lm.binary \
--lm_trie_path /auto/k1/shahdloo/Projs/stories-nn/models/trie
and here’s the tail of the error I get:
2018-05-24 15:50:07.474741: I tensorflow/core/common_runtime/bfc_allocator.cc:671] Summary of in-use Chunks by size:
2018-05-24 15:50:07.474755: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 47 Chunks of size 256 totalling 11.8KiB
2018-05-24 15:50:07.474764: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 1280 totalling 2.5KiB
2018-05-24 15:50:07.474772: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 516143 Chunks of size 8192 totalling 3.94GiB
2018-05-24 15:50:07.474779: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 12544 totalling 12.2KiB
2018-05-24 15:50:07.474788: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 15872 totalling 15.5KiB
2018-05-24 15:50:07.474795: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 70666 Chunks of size 16384 totalling 1.08GiB
2018-05-24 15:50:07.474803: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 23296 totalling 22.8KiB
2018-05-24 15:50:07.474810: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 63527 Chunks of size 24576 totalling 1.45GiB
2018-05-24 15:50:07.474818: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 25088 totalling 24.5KiB
2018-05-24 15:50:07.474825: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 996 Chunks of size 32768 totalling 31.12MiB
2018-05-24 15:50:07.474833: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 237568 totalling 232.0KiB
2018-05-24 15:50:07.474841: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 16777216 totalling 16.00MiB
2018-05-24 15:50:07.474848: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 33554432 totalling 64.00MiB
2018-05-24 15:50:07.474856: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 69820160 totalling 66.58MiB
2018-05-24 15:50:07.474864: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 72364032 totalling 138.02MiB
2018-05-24 15:50:07.474872: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 144728064 totalling 138.02MiB
2018-05-24 15:50:07.474880: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 3 Chunks of size 201326592 totalling 576.00MiB
2018-05-24 15:50:07.474888: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 6 Chunks of size 289456128 totalling 1.62GiB
2018-05-24 15:50:07.474895: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 578912256 totalling 1.08GiB
2018-05-24 15:50:07.474903: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 10.17GiB
2018-05-24 15:50:07.475010: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats:
Limit: 10921944679
InUse: 10921944064
MaxInUse: 10921944320
NumAllocs: 986278
MaxAllocSize: 578912256
2018-05-24 15:50:07.495472: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ****************************************************************************************************
2018-05-24 15:50:07.495523: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[1,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:1 by allocator GPU_1_bfc
E OOM when allocating tensor with shape[1,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
E [[Node: tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/split = Split[T=DT_FLOAT, num_split=4, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/Add/y, tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/BiasAdd)]]
E Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
E
E [[Node: tower_1/gradients/tower_1/MatMul_1_grad/tuple/control_dependency_1/_567 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_2276_tower_1/gradients/tower_1/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
E Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
E
E
E Caused by op 'tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/split', defined at:
E File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 1838, in <module>
E tf.app.run()
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
E _sys.exit(main(argv))
E File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 1795, in main
E train()
E File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 1501, in train
E results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
E File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 640, in get_tower_results
E calculate_mean_edit_distance_and_loss(model_feeder, i, no_dropout if optimizer is None else dropout_rates)
E File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 521, in calculate_mean_edit_distance_and_loss
E logits = BiRNN(batch_x, tf.to_int64(batch_seq_len), dropout)
E File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 458, in BiRNN
E sequence_length=seq_length)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 416, in bidirectional_dynamic_rnn
E time_major=time_major, scope=fw_scope)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 632, in dynamic_rnn
E dtype=dtype)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 829, in _dynamic_rnn_loop
E swap_memory=swap_memory)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3096, in while_loop
E result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2874, in BuildLoop
E pred, body, original_loop_vars, loop_vars, shape_invariants)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2814, in _BuildLoop
E body_result = body(*packed_vars_for_body)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3075, in <lambda>
E body = lambda i, lv: (i + 1, orig_body(*lv))
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 798, in _time_step
E skip_conditionals=True)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 249, in _rnn_step
E new_output, new_state = call_cell()
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 786, in <lambda>
E call_cell = lambda: cell(input_t, state)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1056, in __call__
E output, new_state = self._cell(inputs, state, scope=scope)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 296, in __call__
E *args, **kwargs)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 696, in __call__
E outputs = self.call(inputs, *args, **kwargs)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 582, in call
E value=gate_inputs, num_or_size_splits=4, axis=one)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1366, in split
E axis=axis, num_split=num_or_size_splits, value=value, name=name)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5069, in _split
E "Split", split_dim=axis, value=value, num_split=num_split, name=name)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
E op_def=op_def)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
E op_def=op_def)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
E self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
E
E ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/split = Split[T=DT_FLOAT, num_split=4, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/Add/y, tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/BiasAdd)]]
E Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
E
E [[Node: tower_1/gradients/tower_1/MatMul_1_grad/tuple/control_dependency_1/_567 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_2276_tower_1/gradients/tower_1/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
E Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
E
E
Traceback (most recent call last):
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/split = Split[T=DT_FLOAT, num_split=4, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/Add/y, tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/BiasAdd)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: tower_1/gradients/tower_1/MatMul_1_grad/tuple/control_dependency_1/_567 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_2276_tower_1/gradients/tower_1/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 1595, in train
step = session.run(global_step, feed_dict=feed_dict)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 546, in run
run_metadata=run_metadata)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1022, in run
run_metadata=run_metadata)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1113, in run
raise six.reraise(*original_exc_info)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/six.py", line 693, in reraise
raise value
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1098, in run
return self._sess.run(*args, **kwargs)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 1170, in run
run_metadata=run_metadata)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 950, in run
return self._sess.run(*args, **kwargs)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/split = Split[T=DT_FLOAT, num_split=4, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/Add/y, tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/BiasAdd)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: tower_1/gradients/tower_1/MatMul_1_grad/tuple/control_dependency_1/_567 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_2276_tower_1/gradients/tower_1/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/split', defined at:
File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 1838, in <module>
tf.app.run()
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 1795, in main
train()
File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 1501, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 640, in get_tower_results
calculate_mean_edit_distance_and_loss(model_feeder, i, no_dropout if optimizer is None else dropout_rates)
File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 521, in calculate_mean_edit_distance_and_loss
logits = BiRNN(batch_x, tf.to_int64(batch_seq_len), dropout)
File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 458, in BiRNN
sequence_length=seq_length)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 416, in bidirectional_dynamic_rnn
time_major=time_major, scope=fw_scope)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 632, in dynamic_rnn
dtype=dtype)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 829, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3096, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2874, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2814, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3075, in <lambda>
body = lambda i, lv: (i + 1, orig_body(*lv))
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 798, in _time_step
skip_conditionals=True)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 249, in _rnn_step
new_output, new_state = call_cell()
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 786, in <lambda>
call_cell = lambda: cell(input_t, state)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1056, in __call__
output, new_state = self._cell(inputs, state, scope=scope)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 296, in __call__
*args, **kwargs)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 696, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 582, in call
value=gate_inputs, num_or_size_splits=4, axis=one)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1366, in split
axis=axis, num_split=num_or_size_splits, value=value, name=name)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5069, in _split
"Split", split_dim=axis, value=value, num_split=num_split, name=name)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/split = Split[T=DT_FLOAT, num_split=4, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/gradients/Add/y, tower_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/BiasAdd)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: tower_1/gradients/tower_1/MatMul_1_grad/tuple/control_dependency_1/_567 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_2276_tower_1/gradients/tower_1/MatMul_1_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
D Closing queues...
2018-05-24 15:52:11.719445: W tensorflow/core/kernels/queue_base.cc:277] _0_padding_fifo_queue_5: Skipping cancelled enqueue attempt with queue not closed
2018-05-24 15:52:11.719571: W tensorflow/core/kernels/queue_base.cc:277] _4_padding_fifo_queue_1: Skipping cancelled enqueue attempt with queue not closed
2018-05-24 15:52:11.719639: W tensorflow/core/kernels/queue_base.cc:277] _0_padding_fifo_queue_5: Skipping cancelled enqueue attempt with queue not closed
2018-05-24 15:52:11.719690: W tensorflow/core/kernels/queue_base.cc:277] _2_padding_fifo_queue_3: Skipping cancelled enqueue attempt with queue not closed
2018-05-24 15:52:11.719709: W tensorflow/core/kernels/queue_base.cc:277] _2_padding_fifo_queue_3: Skipping cancelled enqueue attempt with queue not closed
2018-05-24 15:52:11.719778: W tensorflow/core/kernels/queue_base.cc:277] _4_padding_fifo_queue_1: Skipping cancelled enqueue attempt with queue not closed
2018-05-24 15:52:11.719833: W tensorflow/core/kernels/queue_base.cc:277] _3_padding_fifo_queue_2: Skipping cancelled enqueue attempt with queue not closed
2018-05-24 15:52:11.719877: W tensorflow/core/kernels/queue_base.cc:277] _5_padding_fifo_queue_4: Skipping cancelled enqueue attempt with queue not closed
2018-05-24 15:52:11.719894: W tensorflow/core/kernels/queue_base.cc:277] _5_padding_fifo_queue_4: Skipping cancelled enqueue attempt with queue not closed
2018-05-24 15:52:11.719933: W tensorflow/core/kernels/queue_base.cc:277] _3_padding_fifo_queue_2: Skipping cancelled enqueue attempt with queue not closed
E You must feed a value for placeholder tensor 'Queue_Selector' with dtype int32
E [[Node: Queue_Selector = Placeholder[dtype=DT_INT32, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
E
E Caused by op 'Queue_Selector', defined at:
E File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 1838, in <module>
E tf.app.run()
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
E _sys.exit(main(argv))
E File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 1795, in main
E train()
E File "/auto/k1/shahdloo/Projs/DeepSpeech/DeepSpeech.py", line 1489, in train
E tower_feeder_count=len(available_devices))
E File "/auto/k1/shahdloo/Projs/DeepSpeech/util/feeding.py", line 43, in __init__
E self.ph_queue_selector = tf.placeholder(tf.int32, name='Queue_Selector')
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1746, in placeholder
E return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3051, in _placeholder
E "Placeholder", dtype=dtype, shape=shape, name=name)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
E op_def=op_def)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
E op_def=op_def)
E File "/auto/k1/shahdloo/Projs/stories-nn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
E self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
E
E InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Queue_Selector' with dtype int32
E [[Node: Queue_Selector = Placeholder[dtype=DT_INT32, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
E
E The checkpoint in /auto/data/shahdloo/DeepSpeech/checkpoint/ does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of /auto/data/shahdloo/DeepSpeech/checkpoint/.
One side note: At last, it complains about my n_hidden or alphabet, while they are both like what is used to train the model in the latest release, I assume…
Thanks in advance for your thoughts