DeepSpeech for German Language

am trying to train acoustic model for German using ~80 hrs German dataset.
With the following parameters

python -u DeepSpeech.py \
  --train_files $exp_path/data/train.csv \
  --dev_files $exp_path/data/dev.csv \
  --test_files $exp_path/data/test.csv \
  --train_batch_size 12 \
  --dev_batch_size 12 \
  --test_batch_size 12 \
  --n_hidden 375 \
  --epoch 50 \
  --display_step 0 \
  --validation_step 1 \
  --early_stop True \
  --earlystop_nsteps 6 \
  --estop_mean_thresh 0.1 \
  --estop_std_thresh 0.1 \
  --dropout_rate 0.22 \
  --learning_rate 0.00095 \
  --report_count 10 \
  --use_seq_length False \
  --coord_port 8686 \
  --export_dir $exp_path/model_export/ \
  --checkpoint_dir $exp_path/checkpoints/ \
  --decoder_library_path native_client/libctc_decoder_with_kenlm.so \
  --alphabet_config_path $alphabet_path \
  --lm_binary_path $exp_path/lm.binary \
  --lm_trie_path $exp_path/trie 

what do you think is the good value for n_hidden parameter.

Tried with 375, 1024 and 2048, (Early stop enabled) but I’m getting very high validation and test losses, though training losses are less.
For e.g.
With n_hidden = 375, WER = 0.582319 CER=36.162546 loss=146.159454
With n_hidden = 1024, WER = 0.759299 CER=27.491103 loss =101.068916

The models are not giving any thing close when tested with test wav files but are giving perfect output with training wav files. Looks like model has overfitted though early stop is enabled. Also, training loss are falling sharp to ~20s while validation losses are staying high at ~100s.

Any suggestions on how to improve the test/validation loss.

hi, may I ask you a question? What version of deepspeech do you use? I found you use the parameter of “–display_step”, but there is no such a parameter in my version (https://github.com/mozilla/DeepSpeech) when I run “./DeepSpeech.py --helpful”.

I’ve got exactly the same issue. Did you find an answer or better results changing the hyperparamters?

Hello,

80hrs of training material sounds quite small. Are you trying to do General Speech to text model so it can understand German language in general or focus on some topic … ?

@lissyx: When training Deep Speech with German, do we need to change number of FEATURES in the code? I saw somewhere in the code number of features mentioned as 26, which corresponds to 26 English alphabets. Do we need to set it as 29 for German?

You can either change German Umlaute (ä, ö, ü and ß) to ae, … or add those to the alphabet file. Either way has advantages and disadvantages. You don’t change the number of features in the code.

1 Like

@othiele: Sorry, if this question appears naive, are the number of features for English or German or any other language same i.e. 26? How many MFCC features are extracted from the audio signal?

You don’t need to change the the number of features.

If you are still looking for DeepSpeech results on German Language. Check paper and repository. It might be useful.

https://www.researchgate.net/publication/336532830_German_End-to-end_Speech_Recognition_based_on_DeepSpeech

2 Likes