am trying to train acoustic model for German using ~80 hrs German dataset.
With the following parameters
python -u DeepSpeech.py \
--train_files $exp_path/data/train.csv \
--dev_files $exp_path/data/dev.csv \
--test_files $exp_path/data/test.csv \
--train_batch_size 12 \
--dev_batch_size 12 \
--test_batch_size 12 \
--n_hidden 375 \
--epoch 50 \
--display_step 0 \
--validation_step 1 \
--early_stop True \
--earlystop_nsteps 6 \
--estop_mean_thresh 0.1 \
--estop_std_thresh 0.1 \
--dropout_rate 0.22 \
--learning_rate 0.00095 \
--report_count 10 \
--use_seq_length False \
--coord_port 8686 \
--export_dir $exp_path/model_export/ \
--checkpoint_dir $exp_path/checkpoints/ \
--decoder_library_path native_client/libctc_decoder_with_kenlm.so \
--alphabet_config_path $alphabet_path \
--lm_binary_path $exp_path/lm.binary \
--lm_trie_path $exp_path/trie
what do you think is the good value for n_hidden parameter.
Tried with 375, 1024 and 2048, (Early stop enabled) but I’m getting very high validation and test losses, though training losses are less.
For e.g.
With n_hidden = 375, WER = 0.582319 CER=36.162546 loss=146.159454
With n_hidden = 1024, WER = 0.759299 CER=27.491103 loss =101.068916
The models are not giving any thing close when tested with test wav files but are giving perfect output with training wav files. Looks like model has overfitted though early stop is enabled. Also, training loss are falling sharp to ~20s while validation losses are staying high at ~100s.
Any suggestions on how to improve the test/validation loss.