Improving accuracy with 8khz audio?

cnelson · March 21, 2018, 9:30pm

I’m testing out DeepSpeech with 8khz audio and seeing very poor accuracy.

Anyone have general pointers on how to improve this?

Should I train my own model (if so, what’s the recommended dataset size?)

Are there other approaches to working with DeepSpeech and 8khz audio?

lissyx · March 25, 2018, 3:02pm

This is expected, because the model is being trained with 16kHz audio. You can try upsampling, but in our tests we could not really get anything satisfactory (this is why we have a warning in place now). The best would be to re-train (which would require quite large dataset, thousands of hours of audio), or to record with 16kHz.

cnelson · March 25, 2018, 5:44pm

Thanks for confirming my suspicions.

Unfortunately my audio source native sample rate is 8kHz so I’m stuck with it

lissyx · March 25, 2018, 7:26pm

Maybe you can try to upsample and filter it properly ?

singpolyma · March 28, 2018, 7:59pm

Is there a chance that downsampling the training data and re-training would have better results than up-sampling?

lissyx · March 28, 2018, 9:48pm

That’s one idea, but it means re-training everything, which is roughly one full week on our current infra, and it’s busy with other things right now (we need to evaluate some parameters for streaming and some other tuning).

elpimous_robot · April 3, 2018, 8:21pm

@cnelson

You could try 2 things : convert to 16K mono, AND change wav’s wave amplitude !

Check some of your wav’s with audacity : best amplitude should reach ± 0.5

I helped a friend with bad training : a small wav’s amplitude ±0.1 ! limit flat !!

But, on ears, it sounds correct !! not on a pc !