Inference returns only one letter

luke · December 6, 2019, 9:05am

Hello

Version info:
I’m currently using DeepSpeech 0.6.0, before that I was on 0.6.0a15. I checked out the a15 tag to see if it’s a version issue, but it didn’t work on a15 either.

I’m currently working on a speech recognition system to recognize the Swiss German language.
Since I don’t have a good enough GPU, I’m using Google Colab to run DeepSpeech.

My problem is: Since a few days, all my trainings end up in returning just one letter during the inference process. Before that it returned complete words, sometimes correct sometimes not, but at least something.

I followed this example to install and run DeepSpeech on Google Colab: https://github.com/mayukhnair/deepspeech-colab

To run the training, I needed to add the ignore_longer_outputs_than_inputs to the ctc_loss function in DeepSpeech.py. I did the same thing back when it was still working.

This is the command I’m using to start the training:

!./DeepSpeech.py \
--train_files $train_dir \
--dev_files $dev_dir \
--test_files $test_dir \
--alphabet_config_path /content/project2/data/alphabet.txt \
--export_dir $model_dir \
--summary_dir $summary_dir \
--checkpoint_dir $checkpoint_dir \
--n_hidden 2048 \
--noearly_stop \
--train_batch_size 50 \
--dev_batch_size 50 \
--test_batch_size 40 \
--learning_rate 0.00095 \
--dropout_rate 0.22 \
--epochs 10 \
--lm_binary_path /content/project2/data/lm.binary \
--lm_trie_path /content/project2/data/trie \

What I noticed is, that the test step at the end of the training takes a lot longer than a few days ago.

I built the vocabulary on all words in the dataset.
I’ve tried training with sentences that contain specific words, smaller sets as well as the whole dataset, but all of them just return “e” as an inference.

I have all audio files in one directory, even the ones that are not used by the current test, dev and train csv’s. All three csv’s are in this folder.

The language model is built as 3-gram.

Do you have any recommendations?

Best regards
Lukas

Edit: formatted code

lissyx · December 6, 2019, 9:14am

This is not something done by us, so we can’t provide support for that.

This means you have some bogus data, this might interfere with your training.

Can you ensure you rebuilt language model and trie with the latest code, especially generate_trie ?

Can you make sure you sart from a clean checkpoint as well ?

luke · December 6, 2019, 2:46pm

Thank you for your reply!

I’ve just started the training of a new model after recreating the language model and trie with DeepSpeech v0.6.0.

It runs quite slow, but hopefully better

I’ll try to figure out which audio files / transcriptions cause the interference by using this approach: https://github.com/mozilla/DeepSpeech/issues/1629#issuecomment-436864418

Update: @lissyx I cleaned up the dataset by doing the thing mentioned in the link above and rebuilt the lm & trie. I also removed all sentences with length shorter than 10. Now it works again
Thank you a lot for your help!