Tried the pre-trained model 0.5.1 and the results were not so good? Am I missing anything?

sumeet.tiwari · September 16, 2019, 5:27pm

I downloaded the tried the pretrained model 0.5.1 and the results were not so good when trying with audio files at 16Khz. What need to be done to get state of the art results? Thanks

dabinat · September 16, 2019, 6:37pm

The pre-trained model is a work in progress with a fraction of the data used in a production model and you will either need to wait for the model to improve or augment it with your own data.

lissyx · September 17, 2019, 7:54am

Please give more context on what you are doing.

sumeet.tiwari · September 17, 2019, 9:40am

I recorded a file using amazon polly on audacity with rate 16KHz.
soxi joana.wav
Input File : ‘joana.wav’
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:08.31 = 132912 samples ~ 623.025 CDDA sectors
File Size : 266k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM

deepspeech --model deepspeech-0.5.1-models/output_graph.pbmm --alphabet deepspeech-0.5.1-models/alphabet.txt --lm deepspeech-0.5.1-models/lm.binary --trie deepspeech-0.5.1-models/trie --audio joana.wav

Expected - hi my name is joanna welcome to same space how to install the data in my server and download it later on.
Got - hi my name is joanna welcome to same space how to install the date my serverino mood it later on

serverino is not even a word.

This is a very good quality audio and it should have been flawless
joana.wav.zip (217.8 KB)

lissyx · September 17, 2019, 12:48pm

So you have a feminine TTS voice.

I heard sand space instead of same space, and with the voice they use, I have a hard time hearing the server and download it later, to be honest.

Our dataset for training still lacks good amount of feminine tone of voice, and you are relying on TTS. It’s not surprising the results are not perfect.

I’m not really sure what can be quickly improved in your case …

sumeet.tiwari · September 17, 2019, 1:09pm

What data sets have been used to train this pretrained release model. I will exclude them and add other examples and fine tune. In the release page it is said that train_files Fisher, LibriSpeech, and Switchboard training corpora.
Have you used the full datasets mentioned above or parts of it.

lissyx · September 17, 2019, 2:53pm

This is in the release notes.