Multiple Unigrams not getting recognised well when using custom LM

speech_deep · December 16, 2019, 6:59am

I am using Deepspeech 0.5.1. My use case is to recognise few voice commands (mainly digits and few other words). I created custom LM for my commands. Here is what I did for it:

Vocabulary.txt (containing the command to be recognised):

one
two
three
four
five
six
seven
eight
nine
yes
no
tell me options
need help

Generate LM

~/terminal/kenlm/build/bin/lmplz  --text vocabulary.txt --arpa words.arpa --order 5 --discount_fallback --temp_prefix /tmp/

# Generate lm
~/terminal/kenlm/build/bin/build_binary -T -s words.arpa lm.binary

# Generate trie
~/terminal/repository/DeepSpeech/generate_trie alphabet.txt lm.binary trie

Problem :

it’s working reasonably well, it does recognise the single digits and other sentences in vocabulary.txt. However it does not work well when I speak multiple digits together : e.g. “four nine seven” … It misses one or more words and mostly gives only single word output (though sometime multiple words do come).

Am I doing something wrong? How to improve results.

lissyx · December 16, 2019, 9:29am

Please share more context on how you evaluate your performances.

speech_deep · December 16, 2019, 9:35am

I use the “mic_vad_streaming” script and speak through mic…

lissyx · December 16, 2019, 9:37am

So you add a lot of variability in the treatment of the sound and in reproductibility. Please record clean, low-pace audio into wav file to perform reproductible comparisons.

speech_deep · December 16, 2019, 9:45am

ok. Will try wit that, but even if it works, how do I make it work in real world…

My use case is to be able to use it on android and recognise commands spoken on phone. End user is expected to use a noise cancellation headset (so I can produce a clean audio), but can not recommend to speak in low-pace. How do I take care of pace of speech and make it work for usual speech pace.

bilal.iqbal · December 16, 2019, 9:46am

I may be wrong here, but isn’t it because the language model order parameter is set to 5 (so it is trying to model word sequences of length 5), but the actual dataset mostly has one-word sentences?

This brings me to a question: how would you set the order parameter to lower than 3? I remember being unable to set the O parameter to 1 just recognize single words.

Or does the O parameter does not make a difference for a small dataset such as this?

lissyx · December 16, 2019, 9:53am

I’m asking you to perform something reproductible to isolates issues, because the usecase you describe we did successfully test it on android devices …

Also, you are testing with a user-contributed example, that can have its own flaws, and add issues.

speech_deep · December 16, 2019, 10:04am

Ok… I get it… just to summarize: with what I have done, multiple unigrams must be recognized and there is no extra configuration or customization need to be done in LM…

lissyx · December 16, 2019, 10:16am

Your LM is very very very small, I also know that KenLM might have a hard time doing something working properly, at least in this configuration. As @bilal.iqbal mentionned, order 5 with that few data might not yield very efficient LM.

You also lack trie argument in build_binary call.

speech_deep · December 16, 2019, 10:29am

Is it possible to print the inference logs, to look further into how LM is influencing final results…

lissyx · December 16, 2019, 10:37am

Yes, @reuben landed some debugging helpers, but you have to rebuild the CTC decoder. Please have a look at the DEBUG define for native_client/ctcdecoder/

speech_deep · December 16, 2019, 10:49am

Thanks. this helps.

Will correct the trie argument in build_binary call…

lissyx · December 20, 2019, 2:04pm

It would be great if you can confirm this is still an issue on 0.6.0. Our testing shows it should work pretty well.

speech_deep · December 20, 2019, 3:00pm

We are not using 0.6.0. We are on 0.5.1.
Also my problem was solved to an extent by adding permutations of all digits in my vocabulary.txt. With this, LM is picking up the combination and results are better.
Now I am adding debug statements in CTC decoder to nail it further.

Thanks for asking.

lissyx · December 20, 2019, 3:06pm

That’s why I said it would be great if you confirm this is still an issue on 0.6.0

lissyx · December 20, 2019, 3:07pm

It’d be still valuable if you can evaluate on 0.6.0

speech_deep · December 20, 2019, 3:08pm

Sure. Will do that and update here.