FormatLoadException Error during running build_binary command

Hi Everyone,

I am trying to train very small model first to get my self used to with all the commands.

Language I am trying train is Gujarati. Is this issue with Unicode characters for my language?

I am using following tutorial,

Reference Guide:

Command:
…/…/native_client/kenlm/build/bin/build_binary -T -s words.arpa lm.binary

Error:
/DeepSpeech/native_client/kenlm/lm/model.cc:100 in void lm::ngram::detail::GenericModel<Search, VocabularyT>::InitializeFromARPA(int, const char*, const lm::ngram::Config&)
[with Search = lm::ngram::detail::HashedSearchlm::ngram::BackoffValue; VocabularyT = lm::ngram::ProbingVocabulary] threw FormatLoadException.
This ngram implementation assumes at least a bigram model. Byte: 20
ERROR

Thanks in advance for help.

No, you are following an old tutorial instead of following uptodate documentation under data/lm

As suggested by error I added “-s” and it did worked fine. Please do we have some sort of guide where I can understand the actual reason behind this errors,

The ARPA file is missing <> then use of -s

and

Could not calculate Kneser-Ney discounts for 3-grams with adjusted count 4 because we didn’t observe any 3-grams with adjusted count 3; Is this small or artificial data?
Try deduplicating the input. To override this error for e.g. a class-based model, rerun with --discount_fallback

Thanks

I am getting this error,

Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /DeepSpeech/dvolume/gujarati/results/checkout/train-0
I0104 23:55:28.599471 139891706021696 saver.py:1280] Restoring parameters from /DeepSpeech/dvolume/gujarati/results/checkout/train-0
I Restored variables from most recent checkpoint at /DeepSpeech/dvolume/gujarati/results/checkout/train-0, step 0
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /DeepSpeech/dvolume/gujarati/dev/dev.csv
Traceback (most recent call last):
File “/DeepSpeech/DeepSpeech.py”, line 966, in
absl.app.run(main)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “/DeepSpeech/DeepSpeech.py”, line 939, in main
train()
File “/DeepSpeech/DeepSpeech.py”, line 646, in train
dev_loss = dev_loss / total_steps
ZeroDivisionError: float division by zero

Is it because of my small dataset?

Likely, yes, this is why.