Hello there
I’m currently facing a problem with the evaluation epoch after the OPTIMIZATION is complete.
Here is the error I get:
File "./deepspeech/DeepSpeech.py", line 966, in <module>
absl.app.run(main)
File "/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "./deepspeech/DeepSpeech.py", line 943, in main
test()
File "./deepspeech/DeepSpeech.py", line 676, in test
samples = evaluate(FLAGS.test_files.split(','), create_model, try_loading)
File "/home/duser/deepspeech/evaluate.py", line 48, in evaluate
Config.alphabet)
File "/home/duser/.local/lib/python3.7/site-packages/ds_ctcdecoder/__init__.py", line 23, in __init__
err = native_alphabet.deserialize(serialized, len(serialized))
File "/home/duser/.local/lib/python3.7/site-packages/ds_ctcdecoder/swigwrapper.py", line 176, in deserialize
return _swigwrapper.Alphabet_deserialize(self, buffer, buffer_size)
TypeError: in method 'Alphabet_deserialize', argument 2 of type 'char const *'
The alphabet looks like that:
# Each line in this file represents the Unicode codepoint (UTF-8 encoded)
# associated with a numeric label.
# A line that starts with # is a comment. You can escape it with \# if you wish
# to use '#' as a label.
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
ö
ä
ü
'
# The last (non-comment) line needs to end with a newline.
According to the “file” command on Ubuntu, it is encoded in UTF-8. I have done trainings before with the characters ä, ü & ö and it worked well. This is a new setup though.
This is the command I used to start DeepSpeech:
./deepspeech/DeepSpeech.py \
--train_files data/clips/train.csv \
--dev_files data/clips/dev.csv \
--test_files data/clips/test.csv \
--alphabet_config_path data/alphabet.txt \
--export_dir trainings/trial3/model \
--summary_dir trainings/trial3/summary \
--checkpoint_dir trainings/trial3/checkpoints \
--n_hidden 2048 \
--train_batch_size 128 \
--dev_batch_size 128 \
--test_batch_size 64 \
--learning_rate 0.0001 \
--dropout_rate 0.1 \
--epochs 20 \
--lm_binary_path data/lm-n5-whole_sentence.binary \
--lm_trie_path data/trie
Could I somehow skip the evaluation part and still build a model? I could then test the accuracy on another system.
What I have already tried:
- Removing letters -> instant error at the beginning of the training
- Copying a fresh alphabet and adding the letters again -> same error
- Copying the missing letters (ä, ü & ö) directly from the command line output when using check_characters.py
Thank you for your help.
Best regards
Lukas