Should the KenLM inside the DeepSpeech repo be used for building the lm files or is the official KenLM compatible as well?
Like we have it documented in data/lm/README.md
?
I don’t think that recipe implies where the lmplz and build_binary executables are searched for. I have used ones built from the official KenLM repo, but was wondering whether DS depends on the in-tree version.
In the past I’ve used the official repo and had no problems with the so created language models.
Yeah we have no requirement here, maybe you could file an issue and/or a PR to augment the doc to make it clear? It was clear in our mind, but obviously it’d be better to state it
I am stuck with creating a lm.binary, where should I run the python file which is in data/lm/README.md
I have saved as python file and try running and I am getting a syntax error
File “lmbinary.py”, line 21
!lmplz --order 5
^
SyntaxError: invalid syntax
I have tried running inside the kenlm/build/bin folder there only I can see lmplz and build_binary.
Please clarify me.
In the README where it says:
following this recipe (Jupyter notebook code):
… that means it’s Jupyter notebook code and as a result you would need to run it in a Jupyter notebook (it doesn’t say to save it as a python script and run that, as you appear to have done).
The reason it’s going wrong is that “!” is a special Jupyter feature to allow you to run command line commands (which lmplz is) and that won’t work in a regular python script. Alternatively you could just use os subprocess or os run to make it work from within your script (but I’d advise the Jupyter option). Running via subprocess or run is a general python thing unrelated to anything here, so I’ll leave you to Google that if you want to go that route.
Hope that helps!