Multiple Unigrams not getting recognised well when using custom LM

I am using Deepspeech 0.5.1. My use case is to recognise few voice commands (mainly digits and few other words). I created custom LM for my commands. Here is what I did for it:

Vocabulary.txt (containing the command to be recognised):

one
two
three
four
five
six
seven
eight
nine
yes
no
tell me options
need help

Generate LM

~/terminal/kenlm/build/bin/lmplz  --text vocabulary.txt --arpa words.arpa --order 5 --discount_fallback --temp_prefix /tmp/

# Generate lm
~/terminal/kenlm/build/bin/build_binary -T -s words.arpa lm.binary

# Generate trie
~/terminal/repository/DeepSpeech/generate_trie alphabet.txt lm.binary trie

Problem :

it’s working reasonably well, it does recognise the single digits and other sentences in vocabulary.txt. However it does not work well when I speak multiple digits together : e.g. “four nine seven” … It misses one or more words and mostly gives only single word output (though sometime multiple words do come).

Am I doing something wrong? How to improve results.

Please share more context on how you evaluate your performances.

I use the “mic_vad_streaming” script and speak through mic…

So you add a lot of variability in the treatment of the sound and in reproductibility. Please record clean, low-pace audio into wav file to perform reproductible comparisons.

ok. Will try wit that, but even if it works, how do I make it work in real world…

My use case is to be able to use it on android and recognise commands spoken on phone. End user is expected to use a noise cancellation headset (so I can produce a clean audio), but can not recommend to speak in low-pace. How do I take care of pace of speech and make it work for usual speech pace.

I may be wrong here, but isn’t it because the language model order parameter is set to 5 (so it is trying to model word sequences of length 5), but the actual dataset mostly has one-word sentences?

This brings me to a question: how would you set the order parameter to lower than 3? I remember being unable to set the O parameter to 1 just recognize single words.

Or does the O parameter does not make a difference for a small dataset such as this?

I’m asking you to perform something reproductible to isolates issues, because the usecase you describe we did successfully test it on android devices …

Also, you are testing with a user-contributed example, that can have its own flaws, and add issues.

Ok… I get it… just to summarize: with what I have done, multiple unigrams must be recognized and there is no extra configuration or customization need to be done in LM…

Your LM is very very very small, I also know that KenLM might have a hard time doing something working properly, at least in this configuration. As @bilal.iqbal mentionned, order 5 with that few data might not yield very efficient LM.

You also lack trie argument in build_binary call.

Is it possible to print the inference logs, to look further into how LM is influencing final results…

Yes, @reuben landed some debugging helpers, but you have to rebuild the CTC decoder. Please have a look at the DEBUG define for native_client/ctcdecoder/

Thanks. this helps.

Will correct the trie argument in build_binary call…

It would be great if you can confirm this is still an issue on 0.6.0. Our testing shows it should work pretty well.

We are not using 0.6.0. We are on 0.5.1.
Also my problem was solved to an extent by adding permutations of all digits in my vocabulary.txt. With this, LM is picking up the combination and results are better.
Now I am adding debug statements in CTC decoder to nail it further.

Thanks for asking.

That’s why I said it would be great if you confirm this is still an issue on 0.6.0

It’d be still valuable if you can evaluate on 0.6.0 :slight_smile:

Sure. Will do that and update here.