Hi everyone
I have eventually succeeded in training my model on Russian/Kazakh mixed data.
However, I got very high WER.
I suppose that the main reason is the usage of 8KHZ instead of 16khz.
What do you think about this? Any comments?
This is what I got at the end of the training session:
I Training of Epoch 99 - loss: 2.772892
I FINISHED Optimization - training time: 1:20:22
100% (499 of 499) |######################| Elapsed Time: 0:29:22 Time: 0:29:22
Preprocessing [’/home/dulan/data/test.csv’]
Preprocessing done
Computing acoustic model predictions…
100% (285 of 285) |######################| Elapsed Time: 0:08:37 Time: 0:08:37
Decoding predictions…
100% (285 of 285) |######################| Elapsed Time: 0:07:52 Time: 0:07:52
Test - WER: 0.993192, CER: 66.797807, loss: 326.212860
Train: 15,000 wav files
Dev: 4000 wav files
Test: 1500 wav files
The run command:
nohup python -u DeepSpeech/DeepSpeech.py --train_files /home/dulan/data/train.csv --dev_files /home/dulan/data/dev.csv --test_files /home/dulan/data/test.csv --train_batch_size 8 --dev_batch_size 8 --test_batch_size 8 -alphabet_config_path /home/dulan/models/alphabet.txt --lm_binary_path /home/dulan/models/lm.binary --lm_trie_path /home/dulan/models/trie --epoch 100 --display_step 1 --export_dir /home/dulan/models --learning_rate 0.000025 --dropout_rate 0 --word_count_weight 3.5 --log_level 1
Also, how we can configure deepspeech 16khz default input to 8KHZ ?
@lissyx (BTW thanks for last comments)
Best