My model is giving good inference also loading model and language model but fail to load trie file. i downloaded kenml and followed step to create arpa,lm and generate trie with the same command given by @elpimous_robot but during inference i get the following error
Have you tried following the docs under data/lm/README.rst
? The tutorial from @elpimous_robot is old and likely refers to different ways of generating.
Yes i followed that too but now getting “Error: Trie file version mismatch (4 instead of expected 3). Update your trie file” i also updated ds-ctcdecoder==0.6.0a11 for python 3.6 .
And on trying python3 util/taskcluster.py --branch “v0.4.1” --target “.” it gives me 404 error so not able to use generate trie from native-client.tar.gz
@sanjay.pandey Please share more context on your setup, there’s a lot of things going in every direction right now.
I have built my own language model but on running command line it gives
“Error: Trie file version mismatch (4 instead of expected 3). Update your trie file” .
So one of the solution I found out was to download native-client from python3 util/taskcluster.py --branch “v0.4.1” --target “.” and use generate trie from that but i am unable to download native-client with python3 util/taskcluster.py --branch “v0.4.1” --target “.” as it gives http 404 error.
I am not sure if this is the solution for my main problem but currently stucked at the first step itself
You are still not documenting your context. Versions, etc …
Sorry.
I am using Deepspeech 0.4.1 and have trained more than 3 lacs indian audio file from pretrained model.
I have trained them on around 20k words and hence i wanted to include the same 20k words in my language model.
So i created my language model by downloading kenml and then creating arpa then binary file then i generated trie after that when i did inference it was giving trie error but though inference was true.
What i observed is i downloaded master version of generate_trie file as i havent specified any branch while using task_cluster.py and hence i think so maybe the problem lies there but right now when i am trying to download native client by specifying branch 0.4.1 it gives me http didnt found 404 error.
Yes, this is known, please download it from github release page: https://github.com/mozilla/DeepSpeech/releases/tag/v0.4.1
yes already doing from deepspeech 0.4.1 but this command is giving me error of 404 when i am trying to download pre built binaries and trie file python3 util/taskcluster.py --branch “v0.4.1” --target “.” and when i am running the same command without mentioning branch then it takes from master
Again, download the proper native_client
from the github release page.
Already downloaded again 0.4.1 from the link you gave me and then tried
python3 util/taskcluster.py --target “.” without specifying branch.
After that cloned kenlm from the github and created arpa and binary and then created trie from generate trie which i got after running taskcluster file.
And then when I run my model it gives me error as
Error: Trie file version mismatch (4 instead of expected 3). Update your trie file.
Not able to understand what to do more to get rid of the error
I don’t understand why you keep trying to download using taskcluster.py if you downloaded the tar from github. Extract it, generate_trie is inside.
Okay i downloaded native_client.amd64.cpu.linux.tar.xz and then ran the model this time it didnt give error but it shows as running inference and then without giving result it ends with segmentation fault(core dumped)
Have you used generate_trie
?
I’m afraid you really need to share us more informations …
Solved thanks i was mixing version. Thank you so much.You are always first to answer on this forum irrespective of timing. I really admire that. thanks a lot.
Can you tell me what is the use of trie in lm as i was getting correct inference despite of failing trie.
Segmentation fault (core dumped) problem is appearing again yes i used generate trie after extracting from native client and generate trie and using the same in inference and at the end i am getting as “Segmentation fault (core dumped)” after running inference appear on command line
I can’t do divination, so you will have to share more context again … But honestly, I don’t have time to debug a segfault on 0.4.1.
Sorry but can you please explain the reason to me? Cause it is only giving segmentation fault(core dumped) only when i include lm and trie and if i dont include them i am getting correct inference.
So is there something wrong in my trie file making or lm model.
The command which i am using is
for ARPA
after going into kenlm/build/bin
./lmplz --text /home/sanjay/DEEPSPEECH\ WORK/words.txt --arpa words.arpa --o 3 --discount_fallback
for binary
./build_binary -T -s words.arpa lm.binary
and then generating trie after extracting generate_trie from native_client.amd64.cpu.linux.tar.xz and then using following command
./generate_trie /home/sanjay/DEEPSPEECH\ WORK/models/alphabet.txt /home/sanjay/DEEPSPEECH\ WORK/models/lm.binary /home/sanjay/DEEPSPEECH\ WORK/models/trie
And then using it during inference in command line. Please help.
Reason for what ?
And no segfault with default LM / trie ?
Please check the documentation.
That does not looks like what we document in https://github.com/mozilla/DeepSpeech/blob/v0.4.1/data/lm/README.md