python3 -u ./DeepSpeech.py --train_files /home/sky-ai/xwt/DeepSpeech/data/train/train.csv --dev_files /home/sky-ai/xwt/DeepSpeech/data/cv/cv.csv --test_files /home/sky/sky-ai/xwt/DeepSpeech/data/test/test.csv --train_batch_size 24 --dev_batch_size 15 --test_batch_size 20 --epoch 20 --display_step 1 \ --validation_step 1 \ --dropout_rate 0.30 \ --default_stddev 0.046875 \ --learning_rate 0.0001 \ --log_level 0 \
2019-03-08 16:25:17.113383: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-03-08 16:25:17.213002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:17:00.0
totalMemory: 10.92GiB freeMemory: 10.77GiB
2019-03-08 16:25:17.274068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:65:00.0
totalMemory: 10.92GiB freeMemory: 10.57GiB
2019-03-08 16:25:17.274882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1
2019-03-08 16:25:17.936644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-08 16:25:17.936675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1
2019-03-08 16:25:17.936680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y
2019-03-08 16:25:17.936683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N
2019-03-08 16:25:17.937213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 10419 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1)
2019-03-08 16:25:17.937478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:1 with 10226 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
D Starting coordinator…
D Coordinator started. Thread id 140459615708928
Preprocessing [’/home/sky-ai/xwt/DeepSpeech/data/train/train.csv’]
Preprocessing done
Preprocessing [’/home/sky-ai/xwt/DeepSpeech/data/cv/cv.csv’]
Preprocessing done
W Parameter --validation_step needs to be >0 for early stopping to work
2019-03-08 16:26:26.961329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1
2019-03-08 16:26:26.961413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-08 16:26:26.961419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1
2019-03-08 16:26:26.961424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y
2019-03-08 16:26:26.961428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N
2019-03-08 16:26:26.961956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10419 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1)
2019-03-08 16:26:26.962068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10226 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
2019-03-08 16:26:28.980208: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:28.980297: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592
2019-03-08 16:26:28.980344: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 7730940928 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:28.980353: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 7730940928
2019-03-08 16:26:28.980392: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 6957846528 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:28.980402: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 6957846528
2019-03-08 16:26:28.980433: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 6262061568 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:28.980440: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 6262061568
2019-03-08 16:26:28.980464: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 5635855360 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:28.980471: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 5635855360
2019-03-08 16:26:28.980494: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 5072269824 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:28.980501: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 5072269824
2019-03-08 16:26:28.980526: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 4565042688 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:28.980532: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 4565042688
2019-03-08 16:26:28.980551: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 4108538368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:28.980556: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 4108538368
2019-03-08 16:26:28.980572: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 3697684480 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:28.980577: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 3697684480
2019-03-08 16:26:28.980602: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:28.980607: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592
2019-03-08 16:26:38.980783: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:38.980838: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592
2019-03-08 16:26:38.980875: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-03-08 16:26:38.980886: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592
2019-03-08 16:26:38.980901: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (cuda_host_bfc) ran out of memory trying to allocate 3.33GiB. Current allocation summary follows.
2019-03-08 16:26:38.980918: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256): Total Chunks: 3, Chunks in use: 3. 768B allocated for chunks. 768B in use in bin. 12B client-requested in use in bin.
2019-03-08 16:26:38.980931: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.980942: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1024): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.980954: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.980964: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.980979: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8192): Total Chunks: 12, Chunks in use: 12. 96.0KiB allocated for chunks. 96.0KiB in use in bin. 96.0KiB client-requested in use in bin.
2019-03-08 16:26:38.980990: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.981003: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (32768): Total Chunks: 3, Chunks in use: 3. 96.0KiB allocated for chunks. 96.0KiB in use in bin. 96.0KiB client-requested in use in bin.
2019-03-08 16:26:38.981014: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.981025: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.981036: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.981049: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (524288): Total Chunks: 1, Chunks in use: 0. 831.2KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.981061: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1048576): Total Chunks: 3, Chunks in use: 3. 5.00MiB allocated for chunks. 5.00MiB in use in bin. 5.00MiB client-requested in use in bin.
2019-03-08 16:26:38.981075: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2097152): Total Chunks: 3, Chunks in use: 3. 11.58MiB allocated for chunks. 11.58MiB in use in bin. 11.58MiB client-requested in use in bin.
2019-03-08 16:26:38.981086: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.981097: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.981110: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16777216): Total Chunks: 9, Chunks in use: 9. 144.00MiB allocated for chunks. 144.00MiB in use in bin. 144.00MiB client-requested in use in bin.
2019-03-08 16:26:38.981122: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.981132: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-03-08 16:26:38.981146: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (134217728): Total Chunks: 4, Chunks in use: 3. 524.24MiB allocated for chunks. 384.00MiB in use in bin. 384.00MiB client-requested in use in bin.
2019-03-08 16:26:38.981158: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (268435456): Total Chunks: 3, Chunks in use: 2. 7.33GiB allocated for chunks. 6.66GiB in use in bin. 6.66GiB client-requested in use in bin.
2019-03-08 16:26:38.981171: I tensorflow/core/common_runtime/bfc_allocator.cc:613] Bin for 3.33GiB was 256.00MiB, Chunk State:
2019-03-08 16:26:38.981186: I tensorflow/core/common_runtime/bfc_allocator.cc:619] Size: 684.82MiB | Requested Size: 0B | in_use: 0, prev: Size: 3.33GiB | Requested Size: 3.33GiB | in_use: 1
2019-03-08 16:26:38.981198: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb956000000 of size 3576881152
2019-03-08 16:26:38.981207: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7fba2b32e000 of size 718086144
2019-03-08 16:26:38.981215: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fba74000000 of size 3576881152
2019-03-08 16:26:38.981224: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb4932e000 of size 134217728
2019-03-08 16:26:38.981232: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb5132e000 of size 134217728
2019-03-08 16:26:38.981240: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb5932e000 of size 134217728
2019-03-08 16:26:38.981248: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb6132e000 of size 1746688
2019-03-08 16:26:38.981256: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb614d8700 of size 1746688
2019-03-08 16:26:38.981264: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb61682e00 of size 1746688
2019-03-08 16:26:38.981273: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb6182d500 of size 4046848
2019-03-08 16:26:38.981281: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb61c09500 of size 4046848
2019-03-08 16:26:38.981289: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb61fe5500 of size 4046848
2019-03-08 16:26:38.981297: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb623c1500 of size 16777216
2019-03-08 16:26:38.981305: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb633c1500 of size 16777216
2019-03-08 16:26:38.981313: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb643c1500 of size 16777216
2019-03-08 16:26:38.981321: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb653c1500 of size 16777216
2019-03-08 16:26:38.981329: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb663c1500 of size 16777216
2019-03-08 16:26:38.981337: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb673c1500 of size 16777216
2019-03-08 16:26:38.981345: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb683c1500 of size 16777216
2019-03-08 16:26:38.981353: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb693c1500 of size 16777216
2019-03-08 16:26:38.981361: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb6a3c1500 of size 16777216
2019-03-08 16:26:38.981369: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7fbb6b3c1500 of size 147057408
2019-03-08 16:26:38.981378: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14800000 of size 8192
2019-03-08 16:26:38.981386: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14802000 of size 256
2019-03-08 16:26:38.981394: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14802100 of size 8192
2019-03-08 16:26:38.981402: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14804100 of size 8192
2019-03-08 16:26:38.981410: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14806100 of size 8192
2019-03-08 16:26:38.981418: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14808100 of size 8192
2019-03-08 16:26:38.981426: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf1480a100 of size 8192
2019-03-08 16:26:38.981434: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf1480c100 of size 8192
2019-03-08 16:26:38.981442: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf1480e100 of size 8192
2019-03-08 16:26:38.981450: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14810100 of size 8192
2019-03-08 16:26:38.981458: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14812100 of size 8192
2019-03-08 16:26:38.981466: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14814100 of size 8192
2019-03-08 16:26:38.981474: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14816100 of size 8192
2019-03-08 16:26:38.981485: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14818100 of size 256
2019-03-08 16:26:38.981493: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14818200 of size 256
2019-03-08 16:26:38.981501: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14818300 of size 32768
2019-03-08 16:26:38.981509: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14820300 of size 32768
2019-03-08 16:26:38.981517: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14828300 of size 32768
2019-03-08 16:26:38.981525: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7fbf14830300 of size 851200
2019-03-08 16:26:38.981533: I tensorflow/core/common_runtime/bfc_allocator.cc:638] Summary of in-use Chunks by size:
2019-03-08 16:26:38.981543: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 256 totalling 768B
2019-03-08 16:26:38.981553: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 12 Chunks of size 8192 totalling 96.0KiB
2019-03-08 16:26:38.981562: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 32768 totalling 96.0KiB
2019-03-08 16:26:38.981571: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 1746688 totalling 5.00MiB
2019-03-08 16:26:38.981581: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 4046848 totalling 11.58MiB
2019-03-08 16:26:38.981590: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 9 Chunks of size 16777216 totalling 144.00MiB
2019-03-08 16:26:38.981599: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 134217728 totalling 384.00MiB
2019-03-08 16:26:38.981608: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 3576881152 totalling 6.66GiB
2019-03-08 16:26:38.981617: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 7.19GiB
2019-03-08 16:26:38.981629: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats:
Limit: 68719476736
InUse: 7724988416
MaxInUse: 7724988416
NumAllocs: 38
MaxAllocSize: 3576881152
2019-03-08 16:26:38.981646: W tensorflow/core/common_runtime/bfc_allocator.cc:271] _______*********
2019-03-08 16:26:38.982719: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Resource exhausted: OOM when allocating tensor with shape[2048,436631] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cuda_host_bfc
Traceback (most recent call last):
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1334, in _do_call
return fn(*args)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2048,436631] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cuda_host_bfc
[[{{node save_1/RestoreV2_1}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, …, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_1/tensor_names, save_1/RestoreV2_1/shape_and_slices)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node save_1/RestoreV2_1/_43}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_48_save_1/RestoreV2_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “./DeepSpeech.py”, line 964, in
tf.app.run(main)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “./DeepSpeech.py”, line 916, in main
train()
File “./DeepSpeech.py”, line 549, in train
config=Config.session_config) as session:
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 504, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 921, in init
stop_grace_period_secs=stop_grace_period_secs)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 643, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1107, in init
_WrappedSession.init(self, self._create_session())
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1112, in _create_session
return self._sess_creator.create_session()
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 566, in create_session
init_fn=self._scaffold.init_fn)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/session_manager.py”, line 288, in prepare_session
config=config)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/session_manager.py”, line 218, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 1546, in restore
{self.saver_def.filename_tensor_name: save_path})
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 929, in run
run_metadata_ptr)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1152, in _run
feed_dict_tensor, options, run_metadata)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1328, in _do_run
run_metadata)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2048,436631] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cuda_host_bfc
[[node save_1/RestoreV2_1 (defined at ./DeepSpeech.py:549) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, …, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_1/tensor_names, save_1/RestoreV2_1/shape_and_slices)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node save_1/RestoreV2_1/_43}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_48_save_1/RestoreV2_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op ‘save_1/RestoreV2_1’, defined at:
File “./DeepSpeech.py”, line 964, in
tf.app.run(main)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “./DeepSpeech.py”, line 916, in main
train()
File “./DeepSpeech.py”, line 549, in train
config=Config.session_config) as session:
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 504, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 921, in init
stop_grace_period_secs=stop_grace_period_secs)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 643, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1107, in init
_WrappedSession.init(self, self._create_session())
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1112, in _create_session
return self._sess_creator.create_session()
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 557, in create_session
self._scaffold.finalize()
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 213, in finalize
self._saver = training_saver._get_saver_or_default() # pylint: disable=protected-access
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 886, in _get_saver_or_default
saver = Saver(sharded=True, allow_empty=True)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 1102, in init
self.build()
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 1151, in _build
build_save=build_save, build_restore=build_restore)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 789, in _build_internal
restore_sequentially, reshape)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 459, in _AddShardedRestoreOps
name=“restore_shard”))
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 406, in _AddRestoreOps
restore_sequentially)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 862, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py”, line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
op_def=op_def)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 488, in new_func
return func(*args, **kwargs)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 3274, in create_op
op_def=op_def)
File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1770, in init
self._traceback = tf_stack.extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2048,436631] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cuda_host_bfc
[[node save_1/RestoreV2_1 (defined at ./DeepSpeech.py:549) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, …, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_1/tensor_names, save_1/RestoreV2_1/shape_and_slices)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node save_1/RestoreV2_1/_43}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_48_save_1/RestoreV2_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
No matter how I reduced the train and CV data compatibility, it always said this OMM problem. I’ve tried batch_size 1 but all the same. I used 16GB memory, 2*1080Ti,i7. No other program was running.