I’m trying to train a custom model using current master 0.5.0alpha7 (tensorflow 1.13.1) with my own mixture of librispeech, commonvoice and some private data. My training seems to proceed normally, however the memory usage keeps creeping up till entire 32G memory is exhausted and I have to manually cancel the training (it takes around 2 hours to use up all available memory and I don’t have GPU OOM issue). I am wondering whether this is the correct behaviour. It looks similar to memory leak issue. Has anyone experienced similar issue. Another thing I noticed is that the the rate of step updates also goes down as training progresses.
Below is my training configuration:
python3 -u DeepSpeech.py
–train_files train-random.csv
–dev_files dev-random.csv
–test_files test-random.csv
–train_batch_size 32
–dev_batch_size 32
–test_batch_size 32
–n_hidden 2048
–learning_rate 0.0001
–dropout_rate 0.15
–lm_binary_path /srv/ml_datasets/speech_data/language_corpora/lm.binary
–lm_trie_path /srv/ml_datasets/speech_data/language_corpora/trie
–epochs 50
–checkpoint_dir checkpoint
“$@”
I did rebuilt lm.binary and trie with my dataset.