I am trying to train a model with 30k files for english.
Machine
Nvidia RTX 2070 x 2
Ubuntu 16.04 LTS
Cuda 10.0, V10.0.130
cuDnn 7.4.2
Tensorflow v1.12
DeepSpeech master
Hyper Parameters
–train_batch_size 12
–dev_batch_size 24
–test_batch_size 24
–epoch 30
–learning_rate 0.0001
–display_step 0
–validation_step 1
–dropout_rate 0.15
–checkpoint_step 1
–n_hidden 2048
–lm_alpha 0.75
–lm_beta 1.85
It appears the early stop is triggered and the training stops at epoch 10. The model exported produced poor output.
Training Log
After running for second time the output is
I Validation of Epoch 12 - loss: 65.845520
100% (131 of 131) |######################| Elapsed Time: 0:00:41 Time: 0:00:41
I Training epoch 13…
I Training of Epoch 13 - loss: 18.388819
100% (919 of 919) |######################| Elapsed Time: 0:11:48 Time: 0:11:48
I Validating epoch 13…
I Validation of Epoch 13 - loss: 66.461621
100% (131 of 131) |######################| Elapsed Time: 0:00:41 Time: 0:00:41
I Training epoch 14…
I Training of Epoch 14 - loss: 16.266551
100% (919 of 919) |######################| Elapsed Time: 0:11:48 Time: 0:11:48
I Validating epoch 14…
I Validation of Epoch 14 - loss: 69.215731
I Early stop triggered as (for last 4 steps) validation loss: 69.215731 with standard deviation: 2.297473 and mean: 64.538777
Please advise what mistake I am making?
Should I rerun the same training till it hit epoch 30?
Should I go beyond epoch 30, will I be over fitting the data?
Thanks