Hello,
I’m trying to build a model for demo purpose with a very short vocabulary (7 differents french words: a, b, c, d, e suivant, retour, sauvegarde). In order to do that, I create my own .arpa file and .binary file (with kenlm) then my trie using the same alphabet as the english pre-trained model.
My recordings are mono 16bits and 16kHz.
First time I trained to see if everything was good, it works but due to having not enough data, result were not good.
I record more data and add some and now when I try to train again with a new arpa, binary and trie, I get blank inteference in test, resulting to WER = 100%.
I looked into some topics that said it may be a recording’s format error (not my case) or a character missing in my alphabet. I use checrk_character.py to see if any of them was missing but no.
I tend to think it’s an alphabet problem because this behaviour start after I add some french-CV data and modify the alphabet (and regenerate arpa,binary and trie).
So once I get this error, I came back to my old alphabet and data (+ binary and trie) but the error is still here, which makes me wonder what causes this…
Here my command :
python -u DeepSpeech.py --show_progressbar \
--train_files data/train.csv \
--test_files data/test.csv \
--train_batch_size 1 \
--test_batch_size 1 \
--n_hidden 1024 \
--epochs 1 \
--checkpoint_dir .. \
--export_dir .. \
--summary_dir .. \
--lm_binary_path ../lm.binary \
--lm_trie_path ../trie \
--alphabet_config_path data/alphabet.txt \
and here what kind of output I get :
WER: 1.000000, CER: 1.000000, loss: 4.078539
- src: "a"
- res: ""
My question is: what can be the cause of this behaviour ?
I redo multiple times all the steps, in order to be sure that I didn’t mess somewhere down the road, but still the same issue… And It’s not an environment problem cause with a similar project but with other data (all month in french), it works well.
I work on this since 2 days, my brain is stuck and I can’t find the cause so any help is greatly welcome !
Thank you very much
If you need more information, ask me
EDIT:
My vocabulary.txt file :
a
b
c
d
e
suivant
retour
sauvegarde
My alphabet.txt file :
# Each line in this file represents the Unicode codepoint (UTF-8 encoded)
# associated with a numeric label.
# A line that starts with # is a comment. You can escape it with \# if you wish
# to use '#' as a label.
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
'
# The last (non-comment) line needs to end with a newline.
I check my recording’s format with audacity (that’s the tool I use to create my recordings), may be an other way to check format is better ?
Note that even when I remove lm and trie from parameters, it still returns blank res