Hello,
I worked on Deepspeech, and use pre-trained librispeech model and language model from here which reported as 5.6% WER on libriSpeech clean test corpus. I use this model and test it on 20% of the switchboard dataset without training on it. The data is cut and preprocessed by import_swb.py located under bin directory. and I also convert all the switchboard dataset to be 16 KHz and 16 bit to make sure the sampling rate are matched with the model.
However I found that the WER is very low, it shows around 64% WER. I figured out many incorrect predictions are like followings:
"
ground truth: that’s true
prediction: that’strue
ground truth: it doesn’t matter
prediction: it doesnt matter
"
I also saw many people report similar behaviors on DeepSpeech. I wonder what is the reason to cause such problems, and what is the proper way to fix it. is this problem comes from mis-matched language model? or incorrect input data format?
Thank you!