Hello
Version info:
I’m currently using DeepSpeech 0.6.0, before that I was on 0.6.0a15. I checked out the a15 tag to see if it’s a version issue, but it didn’t work on a15 either.
I’m currently working on a speech recognition system to recognize the Swiss German language.
Since I don’t have a good enough GPU, I’m using Google Colab to run DeepSpeech.
My problem is: Since a few days, all my trainings end up in returning just one letter during the inference process. Before that it returned complete words, sometimes correct sometimes not, but at least something.
I followed this example to install and run DeepSpeech on Google Colab: https://github.com/mayukhnair/deepspeech-colab
To run the training, I needed to add the ignore_longer_outputs_than_inputs to the ctc_loss function in DeepSpeech.py. I did the same thing back when it was still working.
This is the command I’m using to start the training:
!./DeepSpeech.py \ --train_files $train_dir \ --dev_files $dev_dir \ --test_files $test_dir \ --alphabet_config_path /content/project2/data/alphabet.txt \ --export_dir $model_dir \ --summary_dir $summary_dir \ --checkpoint_dir $checkpoint_dir \ --n_hidden 2048 \ --noearly_stop \ --train_batch_size 50 \ --dev_batch_size 50 \ --test_batch_size 40 \ --learning_rate 0.00095 \ --dropout_rate 0.22 \ --epochs 10 \ --lm_binary_path /content/project2/data/lm.binary \ --lm_trie_path /content/project2/data/trie \
What I noticed is, that the test step at the end of the training takes a lot longer than a few days ago.
I built the vocabulary on all words in the dataset.
I’ve tried training with sentences that contain specific words, smaller sets as well as the whole dataset, but all of them just return “e” as an inference.
I have all audio files in one directory, even the ones that are not used by the current test, dev and train csv’s. All three csv’s are in this folder.
The language model is built as 3-gram.
Do you have any recommendations?
Best regards
Lukas
Edit: formatted code