I understand this can be more specific to kenlm than deepspeech itself, however I would like the community’s inputs.
I am following the README.md
in /DeepSpeech/data/lm/
folder to add some contextual words. It works but I have the following problem:
Suppose there is a phrase I want to be caught which sounds similar to more sensible (linguistically speaking) phrases, say for example “ec council” (which is the name of some company I want the stt to detect) and something similar sounding like “easy councel”. I have recreated the lm.binary and trie for this task with following steps:
/home/absin/Documents/Softwares/kenlm/build/bin/lmplz --order 5 \
--temp_prefix /home/absin/git/DeepSpeech/data/lm/new/ \
--memory 50% \
--text /home/absin/git/DeepSpeech/data/lower.txt \
--arpa /home/absin/git/DeepSpeech/data/lm/new/lm.arpa \
--prune 0 0 0 1
When I do
cat lm.arpa | grep 'ec council'
I get:
|-3.8404584|quebec council|-0.09455126|
|-2.0626895|ec council|-0.09455126|
|-3.8596218|quebec councils|-0.09455126|
|-0.6403628|ec council </s>|0|
So its there in the new lm. I then make it binary like so:
/home/absin/Documents/Softwares/kenlm/build/bin/build_binary -a 255 \
-q 8 \
trie \
/home/absin/git/DeepSpeech/data/lm/new/lm.arpa \
/home/absin/git/DeepSpeech/data/lm/new/lm.binary
And then generate trie using this command:
/home/absin/git/DeepSpeech/data/dsbin/generate_trie /home/absin/git/DeepSpeech/data/alphabet.txt /home/absin/git/DeepSpeech/data/lm/new/lm.binary /home/absin/git/DeepSpeech/data/lm/new/trie
But when I run a sample audio the prediction is “easy counsel” and not “ec council”.
I want the stt (or lm) to somehow prioritize the latter response over other competing results is there any way?
A few hacks I can think of is to have a random phrase generator which generates thousands of possible usages of the new phrase and somehow makes it more probable.
Or I can attempt to perform end to end training of the output_graph too to just see if it helps. I have ~ 5000 utterances with these contextual phrases being used in various setting, which I can also use in the previous approach.
Yet another approach can be to perform a post transcription correction, although I cannot think of a simple way to do this.
I would prefer if I can do this cleverly without having to use the above mentioned methods.
Any suggestions? Can I make a few phrases highly probable to be caught over audio-linguistical alternatives.