Details on https://github.com/mozilla/DeepSpeech/issues/2612
Thank you, I will check it in a while and post the results of a .wav file.
About lm_alpha parameter I see that in this project it gets it from the json file that is included inside .zip file with the models. (anyway I can hardcode it to 2.0 instead of 0.75) BUT in my project it just doesnāt use it anywhere!! I will have to check the original code and find out where I have ommited this
Donāt test the LM hack, just make a new export from current master with released v0.6.0 checkpoints.
You need it when you initialize the language model. Not enabling the language model will yield less good results, and might increase inference time.
Yes I can confirm that the transcription is correct now with the last āfixā from here
Now I see āwhy should one halt on the wayā correctly!!
Thank you againā¦
Tomorrow I will see about lm_alpha parameter! on android and colab
Again, you donāt need to change from the default value.
Okā¦just experimenting a bit.
I am focusing on good quality audio with noise reduction. Until now I had to turn up the volume programmatically to get beter results as I was creating the recording.
//Higher volume of microphone
readBytes = mRecorder.read(buffer, 0, BUFFER_SIZE);
if (readBytes > 0) {
for (int i = 0; i < readBytes; ++i) {
buffer[i] = (short)Math.min((int)(buffer[i] * 5.7), (int)Short.MAX_VALUE);
}
}
Speak louder?
You should have a look at the deepspeech implementation in mozillaspeechlibrary
from androidspeech
you linked earlier, thereās a parameter to set the source input that can have impact.
Good morning @lissyx ,
After I ran some examples on colab I come and give you the results of my findings.
A .wav file of 26 seconds takes about 13 seconds for the transcription with cpu and 7 seconds with gpu using the .pb file. On the other hand when I use the same audio file but .tflite the transcription takes 34 seconds (and worse transcription)!
I am lost here! I thought .tflite was for faster results.
Check also our previous messagesā¦where there is your example with .pb and .tflite model filesā overall proccessing time!
Am I wrong somewhere?
Could you please give feedback on what I requested ? A bit worse transcription is expected, but I need to know if the issue we fixed improved things for you.
Transcription time of 34s seems high, but that also depends on the CPU itself. TensorFlow runtime leverages several CPUs, TFLite only uses one core at once.
And I insist, given your report, and given what I have been able to reproduce and verify after the fix, itās very very much likely the major source of discrepency is fixed.
The transcriptions of the .wav files you provide are fine and accurate.
Now I am trying to have better results with my recordings. With your help and your comments there is a major cut of the background noise and I have now better transcriptions.
I just made the comment about the proccessing time that I find odd for the .tflite to be longer that the .pb fileā¦and that is what makes me wondering if I will be able to build a project using the .pb file inside androidā¦
I just cannot sleep thinking about thatā¦
Well, whatās the hardware ? You still have not replied to that.
Here are the specs of colab cpu:
The thing is that with the same beam_width (500) transcription takes the same amount of time in my phone (8 cores with 4GB RAM). And also It uses the same cpu for .pb file
Not sure what you mean here: does it means you get the same, good execution time of tflite on your own CPU as well as on your phone, but itās slower on Colab ?
I donāt know the details of Colab, thereās hardly anything we can do from here: who knows how the resources are shared amongts VMs.
Also, thatās not very descriptive. As documented, we only tested and verified faster than realtime on Snapdragon 820 (Sony Xperia Z) and 835 (Google Pixel 2), you might not get the same behavior on other hardware.
Anyway I am happy we have solved the issue with 0.6.0 tflite file.
About 2 minutes ago I updated the dependency of the deepspeech module (android) to 0.6.1.alpha0. Also everything transcribes fine!
A 2 seconds file takes 1.5 seconds to transcribeā¦so if I have a loop before the creation of the new .wav file I have the transcription of the current!
Sweet!
You mean to 0.6.0 final ?
So does it means your issues are mostly solved and now you have an experience closer to what we expect ?
That does sounds good. TFLite runtime, I hope?
Yes sorryā¦I meant 0.6.1.alpha0 (I edited my previous comment).
I also found inside native_client how I can build .so file without runtime=tfliteā¦BUT I will not do that
Yes .tflite runtime ā¦I will keep experimenting with good audio recordings.
I will keep monitoring next releases of yours.
Thanks
Thanks for taking the time to investigate and help us find that issue on the model.