Tried the pre-trained model 0.5.1 and the results were not so good? Am I missing anything?

I downloaded the tried the pretrained model 0.5.1 and the results were not so good when trying with audio files at 16Khz. What need to be done to get state of the art results? Thanks

The pre-trained model is a work in progress with a fraction of the data used in a production model and you will either need to wait for the model to improve or augment it with your own data.

Please give more context on what you are doing.

I recorded a file using amazon polly on audacity with rate 16KHz.
soxi joana.wav
Input File : ‘joana.wav’
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:08.31 = 132912 samples ~ 623.025 CDDA sectors
File Size : 266k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM

deepspeech --model deepspeech-0.5.1-models/output_graph.pbmm --alphabet deepspeech-0.5.1-models/alphabet.txt --lm deepspeech-0.5.1-models/lm.binary --trie deepspeech-0.5.1-models/trie --audio joana.wav

Expected - hi my name is joanna welcome to same space how to install the data in my server and download it later on.
Got - hi my name is joanna welcome to same space how to install the date my serverino mood it later on

serverino is not even a word.

This is a very good quality audio and it should have been flawless
joana.wav.zip (217.8 KB)

So you have a feminine TTS voice.

I heard sand space instead of same space, and with the voice they use, I have a hard time hearing the server and download it later, to be honest.

Our dataset for training still lacks good amount of feminine tone of voice, and you are relying on TTS. It’s not surprising the results are not perfect.

I’m not really sure what can be quickly improved in your case …

1 Like

What data sets have been used to train this pretrained release model. I will exclude them and add other examples and fine tune. In the release page it is said that train_files Fisher, LibriSpeech, and Switchboard training corpora.
Have you used the full datasets mentioned above or parts of it.

This is in the release notes.