Error during evaluation epoch: "Alphabet_deserialize"

Hello there

I’m currently facing a problem with the evaluation epoch after the OPTIMIZATION is complete.

Here is the error I get:

  File "./deepspeech/DeepSpeech.py", line 966, in <module>
    absl.app.run(main)
  File "/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "./deepspeech/DeepSpeech.py", line 943, in main
    test()
  File "./deepspeech/DeepSpeech.py", line 676, in test
    samples = evaluate(FLAGS.test_files.split(','), create_model, try_loading)
  File "/home/duser/deepspeech/evaluate.py", line 48, in evaluate
    Config.alphabet)
  File "/home/duser/.local/lib/python3.7/site-packages/ds_ctcdecoder/__init__.py", line 23, in __init__
    err = native_alphabet.deserialize(serialized, len(serialized))
  File "/home/duser/.local/lib/python3.7/site-packages/ds_ctcdecoder/swigwrapper.py", line 176, in deserialize
    return _swigwrapper.Alphabet_deserialize(self, buffer, buffer_size)
TypeError: in method 'Alphabet_deserialize', argument 2 of type 'char const *'

The alphabet looks like that:

# Each line in this file represents the Unicode codepoint (UTF-8 encoded)
# associated with a numeric label.
# A line that starts with # is a comment. You can escape it with \# if you wish
# to use '#' as a label.

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
ö
ä
ü
'
# The last (non-comment) line needs to end with a newline.

According to the “file” command on Ubuntu, it is encoded in UTF-8. I have done trainings before with the characters ä, ü & ö and it worked well. This is a new setup though.

This is the command I used to start DeepSpeech:

./deepspeech/DeepSpeech.py \
--train_files data/clips/train.csv \
--dev_files data/clips/dev.csv \
--test_files data/clips/test.csv \
--alphabet_config_path data/alphabet.txt \
--export_dir trainings/trial3/model \
--summary_dir trainings/trial3/summary \
--checkpoint_dir trainings/trial3/checkpoints \
--n_hidden 2048 \
--train_batch_size 128 \
--dev_batch_size 128 \
--test_batch_size 64 \
--learning_rate 0.0001 \
--dropout_rate 0.1 \
--epochs 20 \
--lm_binary_path data/lm-n5-whole_sentence.binary \
--lm_trie_path data/trie

Could I somehow skip the evaluation part and still build a model? I could then test the accuracy on another system.

What I have already tried:

  • Removing letters -> instant error at the beginning of the training
  • Copying a fresh alphabet and adding the letters again -> same error
  • Copying the missing letters (ä, ü & ö) directly from the command line output when using check_characters.py

Thank you for your help.

Best regards
Lukas

What version of deepspeech is this ? v0.6.0 checkout ? What’s the version of ds_ctcdecoder package ?

deepspeech is on 0.6.1-alpha.0, the ds_ctcdecoder package is on 0.6.1a0

We’ve had stranges issues in the past with people using Anaconda, can you replicate under pure Python virtualenv ?

The system is pretty restricted. I’m working on a docker container where I can only install pip / conda packages.

I would need to ask the admin of those containers to create one without Anaconda for testing purposes.

So if possible in any way, doing it with Anaconda would be awesome :slight_smile:

Then I’m unable to help you. Knowing all your exact setup steps would also be useful. The error makes absolutely no sense.

It’s also unclear, the stack mentions both a python 3.7 non-virtualenv as well as Anaconda …

As documented, if you don’t pass --test_files it should skip it. But there’s still something wrong in the setup.

Just got it working!

That part made me think:

the stack mentions both a python 3.7 non-virtualenv as well as Anaconda

I used pip to remove the ds-ctcdecoder and reinstalled it with the command mentioned on the deepspeech repo. The test epoch successfully started :slight_smile:

Thank you for your help!

That’s unclear. You should be able to just python util/check_characters.py [...] --alphabet-format > data/alphabet.txt and then directly use it.

Right. Can you explain a bit more the difference ? How did you installed it before ? How did you installed it when it worked ? Is everything running under anaconda3 ?

FTR, I’ve shared some Docker setup on https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train a while back to help people start.

1 Like

When I uninstalled the previous ds-ctcdecoder version, it printed the following removal information:

Uninstalling ds-ctcdecoder-0.6.1a0:
  Would remove:
    /home/duser/.local/lib/python3.7/site-packages/ds_ctcdecoder-0.6.1a0.dist-info/*
    /home/duser/.local/lib/python3.7/site-packages/ds_ctcdecoder/*

After reinstalling, it changed to this path:

Uninstalling ds-ctcdecoder-0.6.1a0:
  Would remove:
    /anaconda3/lib/python3.7/site-packages/ds_ctcdecoder-0.6.1a0.dist-info/*
    /anaconda3/lib/python3.7/site-packages/ds_ctcdecoder/*

I think the problem lies in the “–user” flag I used to install some of the pip packages. I had to use this flag because of some errors I received during the installation of other packages. So I instinctively used it for the installation of the ctcdecoder aswell.
This time I removed the “–user” flag and it installed the ctcdecoder into the Anaconda environment :slight_smile:

Thanks, that makes more sense, it means you had a different runtime loading the package. That would likely explain the crash.

Good to know as well that it is working well under anaconda, indeed.

1 Like