I’m trying to create a python script that would plot learning curve showing how the model’s accuracy changes with growing length of training data but that ends with an error.
Model training is started as an external subprocess and that part works fine.
I’ve installed deepspeech-gpu==0.2.1a1 for python 3 with pip, taken client.py and modified it to load the newly generated model and run inference of test data on it so that the model, language model etc. are loaded just once for the test data evaluation.
The first model is evaluated fine, the problem starts once the second model is trained and the script tries to load it for inferences:
ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [2048,2048] and type float
[[{{node h2/Adam/Initializer/zeros}} = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [2048,2048] values: [0 0 0...]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
and the simplified version of the script looks like this:
from deepspeech import Model
for train_duration in train_durations:
train_list_path = create_test_list(training_directory, train_duration)
trained_model = train_model(original_model, checkpoint_dir, export_dir, epoch_number, train_list_path, validation_list_path, test_list_path, learning_rate, deepspeech_directory)
ds = Model(model, N_FEATURES, N_CONTEXT, alphabet, BEAM_WIDTH)
ds.enableDecoderWithLM(alphabet, lm, trie, LM_WEIGHT, VALID_WORD_COUNT_WEIGHT)
train_score = infer_audio_list(train_list_path, ds)
test_score = infer_audio_list(test_list_path, ds)
del ds
It’s most probably related to the previous model not being released but I haven’t found a way to release the previous model from code.
Does anyone have an idea how to fix this?