Hello everyone,
I am trying to fine-tune the pretrained 0.6.0 checkpoints on ted-lium 3, actually a specific subset of it after some cleaning. but my validation loss doesn’t decrease after Epoch 7, while my training loss continues to decrease. Is this to be expected or should i stop the training and modify more hyperparams? thank you.
Training Snapshot:
Epoch 2 | Training | Elapsed Time: 3:16:10 | Steps: 113547 | Loss: 30.255176
Epoch 2 | Validation | Elapsed Time: 0:00:51 | Steps: 507 | Loss: 57.143296 | Dataset: ../datasets/ted-dev.csv
I Saved new best validating model with loss 57.143296 to: ../checkpoints/deepspeech-0.6.0-checkpoint/best_dev-574425
Epoch 3 | Training | Elapsed Time: 3:14:47 | Steps: 113547 | Loss: 28.822134
Epoch 3 | Validation | Elapsed Time: 0:00:51 | Steps: 507 | Loss: 56.931689 | Dataset: ../datasets/ted-dev.csv
I Saved new best validating model with loss 56.931689 to: ../checkpoints/deepspeech-0.6.0-checkpoint/best_dev-687972
Epoch 4 | Training | Elapsed Time: 3:13:50 | Steps: 113547 | Loss: 27.622577
Epoch 4 | Validation | Elapsed Time: 0:00:51 | Steps: 507 | Loss: 56.640776 | Dataset: ../datasets/ted-dev.csv
I Saved new best validating model with loss 56.640776 to: ../checkpoints/deepspeech-0.6.0-checkpoint/best_dev-801519
Epoch 5 | Training | Elapsed Time: 3:13:55 | Steps: 113547 | Loss: 26.533800
Epoch 5 | Validation | Elapsed Time: 0:00:51 | Steps: 507 | Loss: 56.558053 | Dataset: ../datasets/ted-dev.csv
I Saved new best validating model with loss 56.558053 to: ../checkpoints/deepspeech-0.6.0-checkpoint/best_dev-915066
Epoch 6 | Training | Elapsed Time: 3:13:31 | Steps: 113547 | Loss: 25.547271
Epoch 6 | Validation | Elapsed Time: 0:00:51 | Steps: 507 | Loss: 56.451978 | Dataset: ../datasets/ted-dev.csv
I Saved new best validating model with loss 56.451978 to: ../checkpoints/deepspeech-0.6.0-checkpoint/best_dev-1028613
Epoch 7 | Training | Elapsed Time: 1:20:29 | Steps: 63983 | Loss: 15.020540 ^Epoch 7 | Training | Elapsed Time: 3:13:36 | Steps: 113547 | Loss: 24.656791
Epoch 7 | Validation | Elapsed Time: 0:00:51 | Steps: 507 | Loss: 56.751594 | Dataset: ../datasets/ted-dev.csv
Command used for training:
DeepSpeech.py --n_hidden 2048 --checkpoint_dir ../checkpoints/deepspeech-0.6.0-checkpoint --train_files ../datasets/ted-train.csv --dev_files ../datasets/ted-dev.csv --test_files ../datasets/ted-test.csv --learning_rate 0.00005 --export_tflite --export_dir export --lm_alpha 0.75 --lm_beta 1.85 --use_cudnn_rnn=True
Hardware:
AWS P3xLarge - Nvidia V100