I have been running the deepspeech-gpu inference inside docker containers. I am trying to run around 30 containers on one EC2 instance which as a Tesla K80 GPU with 12 GB. The containers run for a bit then I start to get CUDA memory errors: cuda_error_out_of_memory . My question is do you think that this is a problem with CUDA where after the model is loaded it is not releasing the model from memory or something else?
Also, each container has around 360 20 second .wav files I am transcribing. I am using a for loop and am calling the cli via subprocess:
Using deepspeech-gpu==0.4.1 and deepspeech-0.4.1-models.tar.gz
deepSpeechResults = subprocess.Popen("exec deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio " + audioLocation + “> " + savelocation”, stdout=subprocess.PIPE, shell=True)
try:
deepSpeechResults.wait(timeout=30)
except subprocess.TimeoutExpired:
kill(deepSpeechResults.pid)