Low WER on Switchboard with DeepSpeech pretrained model

sniu · May 28, 2018, 2:20pm

Hello,

I worked on Deepspeech, and use pre-trained librispeech model and language model from here which reported as 5.6% WER on libriSpeech clean test corpus. I use this model and test it on 20% of the switchboard dataset without training on it. The data is cut and preprocessed by import_swb.py located under bin directory. and I also convert all the switchboard dataset to be 16 KHz and 16 bit to make sure the sampling rate are matched with the model.

However I found that the WER is very low, it shows around 64% WER. I figured out many incorrect predictions are like followings:
"
ground truth: that’s true
prediction: that’strue

ground truth: it doesn’t matter
prediction: it doesnt matter
"
I also saw many people report similar behaviors on DeepSpeech. I wonder what is the reason to cause such problems, and what is the proper way to fix it. is this problem comes from mis-matched language model? or incorrect input data format?

Thank you!

sniu · May 31, 2018, 3:07pm

I found some related topics on this issue , and it seems that this is a big problem for DeepSpeech. The problem comes from language model, will come back in the future to see if it can be solved. But temporarily will drop using DeepSpeech.