Error in Testing

user438720 · March 18, 2019, 9:40pm

I’m able to successfully train a model but the testing doesn’t seem to be correct. It seems that it’s only testing the first test sample and only the first word in that sample. Below is what the testing output looks like.

Test - WER: 0.993555, CER: 0.921804, loss: 259.808624

WER: 1.000000, CER: 2.000000, loss: 5.678045

src: “ëh”
res: “dah”

WER: 1.000000, CER: 2.000000, loss: 6.826427

src: “ëh”
res: “dah”

WER: 1.000000, CER: 2.000000, loss: 7.017824

src: “ëh”
res: “dah”

WER: 1.000000, CER: 2.000000, loss: 7.017824

src: “ëh”
res: “dah”

WER: 1.000000, CER: 2.000000, loss: 7.592014

src: “ëh”
res: “dah”

WER: 1.000000, CER: 2.000000, loss: 7.783411

src: “ëh”
res: “dah”

WER: 1.000000, CER: 2.000000, loss: 7.974808

src: “ëh”
res: “dah”

WER: 1.000000, CER: 2.000000, loss: 7.974808

src: “ëh”
res: “dah”

WER: 1.000000, CER: 2.000000, loss: 7.974808

src: “ëh”
res: “dah”

WER: 1.000000, CER: 2.000000, loss: 8.166204

src: “ëh”
res: “dah”

I Exporting the model…

This is the first sample from the Test Set the rest of the samples don’t start with ëh.

ëh pat dof inga dife uil diqo hudk

lissyx · March 18, 2019, 10:13pm

No, it’s just the output that is selected to show the worst cases.

reuben · March 18, 2019, 10:20pm

But if there’s really no “ëh” sample in your test dataset then something else is broken. Did you change the code? Are your test CSVs properly formatted?

user438720 · March 19, 2019, 3:53am

There is an “ëh” word in one of my test samples utterances in my test dataset but it’s not the entire sample just the first word. I didn’t change the code and I believe my CSVs are properly formatted it passes the preprocessing check and below is the format:

wav_filename,wav_filesize,transcript
/wavFiles/test1.wav,268294,ëh pat dof inga dife uil diqo hudk

If those are just the worse cases why wouldn’t it show the entire test sample utterance instead of just “ëh” which is just the first word in one of test utterance ie

WER: 1.000000, CER: 2.000000, loss: 8.166204

src: “ëh pat dof inga dife uil diqo hudk”
res: “dah”

Lastly is there any more reporting on the test data than just the following? Any place it shows all the decoding attempts for all the test data?

Test - WER: 0.993555, CER: 0.921804, loss: 259.808624

lissyx · March 19, 2019, 9:55am

@user438720 Can you share more details on your training setup ? We can’t really help without context.

user438720 · March 19, 2019, 5:42pm

Thanks for the help. Below is the command that I used:

$python3.5 -u DeepSpeech.py --train_files /shared/trainList.csv --test_files /shared/newTestList.csv --dev_files /shared/devList.csv --lm_binary_path /shared/lm.binary --lm_trie_path /shared/trie --checkpoint_dir ~/checkPoints/ --export_dir ~/exportDir/ --alphabet_config_path /shared/alphabet.txt --display_step 1 --report_count 10 –fulltrace true

lissyx · March 19, 2019, 5:45pm

That’s not enough, we need more infos on your dataset …