Train a model with 30K files

I am trying to train a model with 30k files for english.

Machine

Nvidia RTX 2070 x 2
Ubuntu 16.04 LTS
Cuda 10.0, V10.0.130
cuDnn 7.4.2
Tensorflow v1.12
DeepSpeech master

Hyper Parameters

–train_batch_size 12
–dev_batch_size 24
–test_batch_size 24
–epoch 30
–learning_rate 0.0001
–display_step 0
–validation_step 1
–dropout_rate 0.15
–checkpoint_step 1
–n_hidden 2048
–lm_alpha 0.75
–lm_beta 1.85

It appears the early stop is triggered and the training stops at epoch 10. The model exported produced poor output.

Training Log
After running for second time the output is

I Validation of Epoch 12 - loss: 65.845520
100% (131 of 131) |######################| Elapsed Time: 0:00:41 Time: 0:00:41
I Training epoch 13…
I Training of Epoch 13 - loss: 18.388819
100% (919 of 919) |######################| Elapsed Time: 0:11:48 Time: 0:11:48
I Validating epoch 13…
I Validation of Epoch 13 - loss: 66.461621
100% (131 of 131) |######################| Elapsed Time: 0:00:41 Time: 0:00:41
I Training epoch 14…
I Training of Epoch 14 - loss: 16.266551
100% (919 of 919) |######################| Elapsed Time: 0:11:48 Time: 0:11:48
I Validating epoch 14…
I Validation of Epoch 14 - loss: 69.215731
I Early stop triggered as (for last 4 steps) validation loss: 69.215731 with standard deviation: 2.297473 and mean: 64.538777

Please advise what mistake I am making?
Should I rerun the same training till it hit epoch 30?
Should I go beyond epoch 30, will I be over fitting the data?

Thanks

Obviously, according to the log, it does overfit. Likely you need to adjust the parameters to the amount of audio data you have. 30k means not the same if it’s 2 seconds each or 10 seconds each.

Thank you for prompt reply. I need to read more about adjusting the parameters. The audio length vary from 4 seconds to 12 seconds. If I am asked to make an average time it will be around 8 seconds.

Can you point me to right resources which will enable me to figure out the right parameters to avoid over fitting.

Thanks you once again.

There’s no one-size-fits-all, you need to adjust according to your dataset and how training / validation progresses.

I have increased the dropout rate to 0.25 from 0.15. I hope this tweaking is in right direction.

I STARTING Optimization
I Training epoch 11…
I Training of Epoch 11 - loss: 28.632439
100% (919 of 919) |######################| Elapsed Time: 0:11:35 Time: 0:11:35
I Validating epoch 11…
I Validation of Epoch 11 - loss: 56.171606
100% (131 of 131) |######################| Elapsed Time: 0:00:38 Time: 0:00:38

You should better restart from scratch …

Thank you, I shall do that