“The First Nobel Truth of machine learning, tuning hyperparameters is painful.” -Me
That said, from this bit “…after 6 epochs val loss stops going down and it starts going up…” it sounds like the learning rate is too high.
As to your query, indeed the hyperparameters do need to be changed for different data sets. Finding the correct values is more of an art than science. For example, initially we just brute forced the dropout values via a binary search on a test dataset.
For tuning the hyperparameters what I suggest is creating a subset of your data set, the --limit_train
, --limit_dev
, and --limit_test
commandline parameters are useful for this. The sample size of the training subset should be around 16k samples so it’s large enough to give a good statistical representation of the full data set.
Using this subset you can then tune various hyperparameters through trial and error relatively rapidly, as the data set is not too large. However, as the size of this subset is large enough to give a good statistical representation of the full data set, the values found for these hyperparameters will also work for the full dataset.
As to a relation between dropout and data size, they’re relatively independent. Dropout usually needs to be decreased when the training audio is noisy. Noisy audio essentially self-regularizes so there’s less need for dropout regularization. On the flip side, dropout can be increased for clean audio.
As for the loss history for the release model. Here it is…
Validation of Epoch 0 - loss: 47.713472
Training of Epoch 0 - loss: 76.803943
Training of Epoch 1 - loss: 49.826667
Validation of Epoch 1 - loss: 27.493896
Validation of Epoch 2 - loss: 27.717039
Training of Epoch 2 - loss: 40.688553
Training of Epoch 3 - loss: 35.456641
Validation of Epoch 3 - loss: 25.176263
Validation of Epoch 4 - loss: 22.647187
Checking for early stopping (last 4 steps) validation loss: 22.647187, with standard deviation: 1.148756 and mean: 26.795733
Training of Epoch 4 - loss: 31.873167
Training of Epoch 5 - loss: 29.195960
Checking for early stopping (last 4 steps) validation loss: 22.647187, with standard deviation: 1.148756 and mean: 26.795733
Validation of Epoch 5 - loss: 21.736987
Validation of Epoch 6 - loss: 20.460873
Checking for early stopping (last 4 steps) validation loss: 20.460873, with standard deviation: 1.455003 and mean: 23.186812
Training of Epoch 6 - loss: 27.093540
Training of Epoch 7 - loss: 25.271599
Checking for early stopping (last 4 steps) validation loss: 20.460873, with standard deviation: 1.455003 and mean: 23.186812
Validation of Epoch 7 - loss: 20.202375
Validation of Epoch 8 - loss: 19.448440
Checking for early stopping (last 4 steps) validation loss: 19.448440, with standard deviation: 0.670847 and mean: 20.800078
Training of Epoch 8 - loss: 23.773587
Training of Epoch 9 - loss: 22.447867
Checking for early stopping (last 4 steps) validation loss: 19.448440, with standard deviation: 0.670847 and mean: 20.800078
Validation of Epoch 9 - loss: 18.980904
Validation of Epoch 10 - loss: 18.989760
Checking for early stopping (last 4 steps) validation loss: 18.989760, with standard deviation: 0.503212 and mean: 19.543907
Training of Epoch 10 - loss: 21.306929
Training of Epoch 11 - loss: 20.299765
Checking for early stopping (last 4 steps) validation loss: 18.989760, with standard deviation: 0.503212 and mean: 19.543907
Validation of Epoch 11 - loss: 19.041425
Validation of Epoch 12 - loss: 18.623053
Checking for early stopping (last 4 steps) validation loss: 18.623053, with standard deviation: 0.026689 and mean: 19.004030
Early stop triggered as (for last 4 steps) validation loss: 18.623053 with standard deviation: 0.026689 and mean: 19.004030