After rebuilding and trying to run 2 parallel processes we notice that one of the processes running still tries to allocate all the GPU memory available meaning we still run into the same out of memory error
2019-11-28 10:53:55.315564: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant
2019-11-28 10:53:55.550252: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 13.69G (14699583744 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.551025: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 12.32G (13229624320 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.551784: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 11.09G (11906661376 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.552518: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 9.98G (10715995136 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.553244: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 8.98G (9644395520 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.553949: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 8.08G (8679955456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.554668: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 7.28G (7811959808 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.555398: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 6.55G (7030763520 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.556143: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 5.89G (6327687168 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.556854: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 5.30G (5694918144 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.557579: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 4.77G (5125426176 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.558281: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 4.30G (4612883456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.559010: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 3.87G (4151595008 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.559719: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 3.48G (3736435456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.560427: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 3.13G (3362791936 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.561154: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.82G (3026512640 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.561890: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.54G (2723861248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.562617: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.28G (2451474944 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.563371: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.05G (2206327296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.564074: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.85G (1985694464 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.564774: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.66G (1787124992 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.565476: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.50G (1608412416 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.566201: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.35G (1447571200 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.566917: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.21G (1302814208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.567654: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.09G (1172532736 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.568357: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1006.39M (1055279616 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.569082: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 905.75M (949751808 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.569801: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 815.18M (854776576 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.570519: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 733.66M (769298944 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:57.587464: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-11-28 10:53:57.823504: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-11-28 10:53:57.853637: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.856427: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.858252: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.859887: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.861685: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.863990: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.864661: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.866457: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.868445: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.870251: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.995128: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.995181: W tensorflow/stream_executor/stream.cc:2130] attempting to perform BLAS operation using StreamExecutor without BLAS support
Error running session: Internal: Blas GEMM launch failed : a.shape=(16, 494), b.shape=(494, 2048), m=16, n=2048, k=494
[[{{node MatMul}}]]
[[{{node logits}}]]
2019-11-28 10:53:57.995617: I tensorflow/stream_executor/stream.cc:2079] [stream=0x12a00170,impl=0x772ad70] did not wait for [stream=0x772ac90,impl=0x772a920]
2019-11-28 10:53:57.995622: I tensorflow/stream_executor/stream.cc:2079] [stream=0x125d9ae0,impl=0x12a2d610] did not wait for [stream=0x772ac90,impl=0x772a920]
2019-11-28 10:53:57.995700: I tensorflow/stream_executor/stream.cc:5027] [stream=0x12a00170,impl=0x772ad70] did not memcpy host-to-device; source: 0x178cbb00
2019-11-28 10:53:57.995713: I tensorflow/stream_executor/stream.cc:5014] [stream=0x125d9ae0,impl=0x12a2d610] did not memcpy device-to-host; source: 0x7fa6de002500
2019-11-28 10:53:57.995741: F tensorflow/core/common_runtime/gpu/gpu_util.cc:339] CPU->GPU Memcpy failed
2019-11-28 10:53:57.997924: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.997954: W tensorflow/stream_executor/stream.cc:2130] attempting to perform BLAS operation using StreamExecutor without BLAS support
2019-11-28 10:53:57.997983: I tensorflow/stream_executor/stream.cc:2079] [stream=0x12153fc0,impl=0x12154060] did not wait for [stream=0x10e21be0,impl=0x68b1600]
2019-11-28 10:53:57.998011: I tensorflow/stream_executor/stream.cc:5014] [stream=0x12153fc0,impl=0x12154060] did not memcpy device-to-host; source: 0x7fd539457400
2019-11-28 10:53:57.998142: F tensorflow/core/common_runtime/gpu/gpu_util.cc:292] GPU->CPU Memcpy failed
Sorry for the huge block of error message but I thought it would be relevant. Would you have any insight as to why this would be happening despite the rebuild with the changes to the tensorflow config?