Ways to decrease validation loss

I have been training a deepspeech model for quite a few epochs now and my validation loss seems to have reached a point where it now has plateaued. After reading several other discourse posts the general solution seemed to be that I should reduce the learning rate.

I have done this twice (at the points marked on the tensorboard graph) and this did make a slight difference initially but then the validation loss returned to it’s previously plateaued level.

I also increased the dropout rate in the hopes that this would produce a more generalised model but and improve the validation loss but it really only increased the training loss and didn’t change the validation loss.

My next thought is to increase the size of the dataset (currently a combination of Common Voice, Librispeech and TED-LIUM at around 1700 hours) Are there any other changes that could be performed other than collecting more data?

Hard to tell without a better overview of your current training parameters.

From your plot (which is missing legends, so I don’t know what is on the x axis, I’ll assume epochs), it seems you are not learning really after epoch 30k, the (hard to read) lines seems to have the asme delta until 40k where you obviously overfit.

Initial Training Parameters:

noearly_stop
train_files librivox + TED-LIUM + Common Voice
dev_files librivox-dev-clean.csv
test_files librivox-test-clean.csv
train_batch_size 32
dev_batch_size 32
test_batch_size 32
n_hidden 2048
learning_rate 0.0001
dropout_rate 0.2
epochs 4
lm_alpha 0.75
lm_beta 1.85
audio_sample_rate 8000
use_allow_growth
use_cudnn_rnn

Then my first change I made was to learning rate:

learning_rate 0.00001

Then I changed both learning rate and dropout rate:

learning_rate 0.000001
dropout_rate 0.25

The graph’s axis are:

Y - Loss
X - Steps (so with my 4 GPU’s and a batch size of 32 this is 128 files per step and with the data I have it is 1432 steps per epoch)

I realise that there is a lack of learning after about 30k steps and the model starts heading towards overfitting after this point. I am just asking if there are any suggestions to changes to the parameters that can be done to aid learning (ie. reduce loss on the validation set) other than adding more data?

As much as I recall, the loss value is relative to your dataset. What’s your WER when you reach some proper level of learning and no overfit ?

WER is around 17% which is quite good but then obviously with the overfitting any additional training doesn’t do anything to change this WER

I’d guess you are just hitting the limit from your dataset capabilities.

Okay I thought that might be the case. Thanks very much for the responses and guidance