Some benchmarks: https://github.com/Franck-Dernoncourt/ASR_benchmark#benchmark-results
Are those results on the test set?
The best I could reach so far was loss: 18.341875 on the training set with the command:
CUDA_VISIBLE_DEVICES=1,2,3 unbuffer python DeepSpeech.py --train_files data/common-voice-v1/cv-valid-train.csv --dev_files data/common-voice-v1/cv-valid-train.csv --test_files data/common-voice-v1/cv-valid-train.csv --log_level 0 --train_batch_size 20 --train True --decoder_library_path ./libctc_decoder_with_kenlm.so --checkpoint_dir cv007 --export_dir cv007export --summary_dir cv007 summaries --summary_secs 600 --wer_log_pattern "GLOBAL LOG: logwer('${COMPUTE_ID}', '%s', '%s', %f)" --learning-rate 0.0001 |& tee -a cv007.log
Loss on training set for each epoch:
Line 12969: I Training of Epoch 0 - loss: inf
Line 25854: I Training of Epoch 1 - loss: 46.667273
Line 38739: I Training of Epoch 2 - loss: 33.400887
Line 51624: I Training of Epoch 3 - loss: 25.859372
Line 64511: I Training of Epoch 4 - loss: inf
Line 77396: I Training of Epoch 5 - loss: 18.341875
Line 95289: I Training of Epoch 6 - loss: inf
Line 113487: I Training of Epoch 7 - loss: inf
Line 130967: I Training of Epoch 8 - loss: inf
Line 149278: I Training of Epoch 9 - loss: inf
Line 168374: I Training of Epoch 10 - loss: inf
Line 192899: I Training of Epoch 11 - loss: inf
Line 216609: I Training of Epoch 12 - loss: inf
Line 246423: I Training of Epoch 13 - loss: inf
Line 271818: I Training of Epoch 14 - loss: inf
Line 301303: I Training of Epoch 15 - loss: inf
Line 333113: I Training of Epoch 16 - loss: inf
Line 366976: I Training of Epoch 17 - loss: inf
Not great, I’m also looking for some better hyperparameters.