After rebuilding and trying to run 2 parallel processes we notice that one of the processes running still tries to allocate all the GPU memory available meaning we still run into the same out of memory error
![image](//discourse-prod-uploads-81679984178418.s3.dualstack.us-west-2.amazonaws.com/original/3X/1/f/1fac13c95d7dbe1e7dbfa19216f6da84f61bdeab.png)
2019-11-28 10:53:55.315564: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant
2019-11-28 10:53:55.550252: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 13.69G (14699583744 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.551025: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 12.32G (13229624320 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.551784: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 11.09G (11906661376 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.552518: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 9.98G (10715995136 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.553244: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 8.98G (9644395520 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.553949: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 8.08G (8679955456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.554668: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 7.28G (7811959808 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.555398: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 6.55G (7030763520 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.556143: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 5.89G (6327687168 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.556854: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 5.30G (5694918144 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.557579: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 4.77G (5125426176 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.558281: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 4.30G (4612883456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.559010: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 3.87G (4151595008 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.559719: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 3.48G (3736435456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.560427: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 3.13G (3362791936 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.561154: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.82G (3026512640 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.561890: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.54G (2723861248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.562617: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.28G (2451474944 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.563371: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.05G (2206327296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.564074: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.85G (1985694464 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.564774: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.66G (1787124992 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.565476: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.50G (1608412416 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.566201: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.35G (1447571200 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.566917: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.21G (1302814208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.567654: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.09G (1172532736 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.568357: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1006.39M (1055279616 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.569082: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 905.75M (949751808 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.569801: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 815.18M (854776576 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:55.570519: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 733.66M (769298944 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-11-28 10:53:57.587464: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-11-28 10:53:57.823504: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-11-28 10:53:57.853637: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.856427: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.858252: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.859887: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.861685: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.863990: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.864661: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.866457: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.868445: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.870251: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.995128: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.995181: W tensorflow/stream_executor/stream.cc:2130] attempting to perform BLAS operation using StreamExecutor without BLAS support
Error running session: Internal: Blas GEMM launch failed : a.shape=(16, 494), b.shape=(494, 2048), m=16, n=2048, k=494
[[{{node MatMul}}]]
[[{{node logits}}]]
2019-11-28 10:53:57.995617: I tensorflow/stream_executor/stream.cc:2079] [stream=0x12a00170,impl=0x772ad70] did not wait for [stream=0x772ac90,impl=0x772a920]
2019-11-28 10:53:57.995622: I tensorflow/stream_executor/stream.cc:2079] [stream=0x125d9ae0,impl=0x12a2d610] did not wait for [stream=0x772ac90,impl=0x772a920]
2019-11-28 10:53:57.995700: I tensorflow/stream_executor/stream.cc:5027] [stream=0x12a00170,impl=0x772ad70] did not memcpy host-to-device; source: 0x178cbb00
2019-11-28 10:53:57.995713: I tensorflow/stream_executor/stream.cc:5014] [stream=0x125d9ae0,impl=0x12a2d610] did not memcpy device-to-host; source: 0x7fa6de002500
2019-11-28 10:53:57.995741: F tensorflow/core/common_runtime/gpu/gpu_util.cc:339] CPU->GPU Memcpy failed
2019-11-28 10:53:57.997924: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2019-11-28 10:53:57.997954: W tensorflow/stream_executor/stream.cc:2130] attempting to perform BLAS operation using StreamExecutor without BLAS support
2019-11-28 10:53:57.997983: I tensorflow/stream_executor/stream.cc:2079] [stream=0x12153fc0,impl=0x12154060] did not wait for [stream=0x10e21be0,impl=0x68b1600]
2019-11-28 10:53:57.998011: I tensorflow/stream_executor/stream.cc:5014] [stream=0x12153fc0,impl=0x12154060] did not memcpy device-to-host; source: 0x7fd539457400
2019-11-28 10:53:57.998142: F tensorflow/core/common_runtime/gpu/gpu_util.cc:292] GPU->CPU Memcpy failed
Sorry for the huge block of error message but I thought it would be relevant. Would you have any insight as to why this would be happening despite the rebuild with the changes to the tensorflow config?