Updating language model with high priority proper nouns

Abhinav · March 4, 2019, 6:47am

I understand this can be more specific to kenlm than deepspeech itself, however I would like the community’s inputs.

I am following the README.md in /DeepSpeech/data/lm/ folder to add some contextual words. It works but I have the following problem:

Suppose there is a phrase I want to be caught which sounds similar to more sensible (linguistically speaking) phrases, say for example “ec council” (which is the name of some company I want the stt to detect) and something similar sounding like “easy councel”. I have recreated the lm.binary and trie for this task with following steps:

/home/absin/Documents/Softwares/kenlm/build/bin/lmplz --order 5 \
       --temp_prefix /home/absin/git/DeepSpeech/data/lm/new/ \
       --memory 50% \
       --text /home/absin/git/DeepSpeech/data/lower.txt \
       --arpa /home/absin/git/DeepSpeech/data/lm/new/lm.arpa \
       --prune 0 0 0 1

When I do
cat lm.arpa | grep 'ec council'
I get:

|-3.8404584|quebec council|-0.09455126|
|-2.0626895|ec council|-0.09455126|
|-3.8596218|quebec councils|-0.09455126|
|-0.6403628|ec council </s>|0|

So its there in the new lm. I then make it binary like so:

/home/absin/Documents/Softwares/kenlm/build/bin/build_binary -a 255 \
              -q 8 \
              trie \
              /home/absin/git/DeepSpeech/data/lm/new/lm.arpa \
              /home/absin/git/DeepSpeech/data/lm/new/lm.binary

And then generate trie using this command:

/home/absin/git/DeepSpeech/data/dsbin/generate_trie /home/absin/git/DeepSpeech/data/alphabet.txt /home/absin/git/DeepSpeech/data/lm/new/lm.binary /home/absin/git/DeepSpeech/data/lm/new/trie

But when I run a sample audio the prediction is “easy counsel” and not “ec council”.

I want the stt (or lm) to somehow prioritize the latter response over other competing results is there any way?

A few hacks I can think of is to have a random phrase generator which generates thousands of possible usages of the new phrase and somehow makes it more probable.
Or I can attempt to perform end to end training of the output_graph too to just see if it helps. I have ~ 5000 utterances with these contextual phrases being used in various setting, which I can also use in the previous approach.
Yet another approach can be to perform a post transcription correction, although I cannot think of a simple way to do this.

I would prefer if I can do this cleverly without having to use the above mentioned methods.

Any suggestions? Can I make a few phrases highly probable to be caught over audio-linguistical alternatives.

yv001 · March 4, 2019, 11:16am

I had a similar question here Adding custom words to language model, no reply yet unfortunately, but you can find there an attempt to estimate the number of times your desired phrase needs to occur in the input corpus.

lissyx · March 4, 2019, 12:35pm

I guess the better solution is indeed https://github.com/mozilla/DeepSpeech/issues/1678 and thus you can tweak the priority this way ?

Abhinav · March 5, 2019, 7:08pm

I fail to understand how running 2 language models would be of assistance here. As reuben explains its got more to do with different transcription tasks like if I am making a music player, there can be a command mode wherein I just issue a command which could either be stop play or pause but if I say search then a different language model takes over which then listens for a full search phrase in general english model.
@lissyx can you please explain in a little more detail how you meant this can be done using 2 lms?

Abhinav · March 5, 2019, 7:02pm

Sounds like an interesting exercise it would of course depend on how lingo-phonetically common my replacement is, the more common the more examples I should provide (and there isn’t a straightforward way of identifying that), I doubt if the number of times my desired phrase needs to occur will be a global statistic.