Android project with .pb files instead of .tflite

lissyx · December 20, 2019, 12:33pm

Details on https://github.com/mozilla/DeepSpeech/issues/2612

George_Soloupis · December 20, 2019, 2:49pm

Thank you, I will check it in a while and post the results of a .wav file.

About lm_alpha parameter I see that in this project it gets it from the json file that is included inside .zip file with the models. (anyway I can hardcode it to 2.0 instead of 0.75) BUT in my project it just doesn’t use it anywhere!! I will have to check the original code and find out where I have ommited this

lissyx · December 20, 2019, 2:53pm

Don’t test the LM hack, just make a new export from current master with released v0.6.0 checkpoints.

You need it when you initialize the language model. Not enabling the language model will yield less good results, and might increase inference time.

George_Soloupis · December 20, 2019, 3:00pm

Yes I can confirm that the transcription is correct now with the last “fix” from here

Now I see “why should one halt on the way” correctly!!

Thank you again…

Tomorrow I will see about lm_alpha parameter! on android and colab

lissyx · December 20, 2019, 3:05pm

Again, you don’t need to change from the default value.

George_Soloupis · December 20, 2019, 3:11pm

Ok…just experimenting a bit.

I am focusing on good quality audio with noise reduction. Until now I had to turn up the volume programmatically to get beter results as I was creating the recording.

    //Higher volume of microphone
                    readBytes = mRecorder.read(buffer, 0, BUFFER_SIZE);
                    if (readBytes > 0) {
                        for (int i = 0; i < readBytes; ++i) {
                            buffer[i] = (short)Math.min((int)(buffer[i] * 5.7), (int)Short.MAX_VALUE);
                        }
                    }

lissyx · December 20, 2019, 3:13pm

Speak louder?

You should have a look at the deepspeech implementation in mozillaspeechlibrary from androidspeech you linked earlier, there’s a parameter to set the source input that can have impact.

lissyx · December 20, 2019, 3:16pm

That: https://github.com/mozilla/androidspeech/commit/2bf0774519fa58249e214bfc34b72b1e742d50a1

George_Soloupis · December 21, 2019, 11:00am

Good morning @lissyx ,

After I ran some examples on colab I come and give you the results of my findings.

A .wav file of 26 seconds takes about 13 seconds for the transcription with cpu and 7 seconds with gpu using the .pb file. On the other hand when I use the same audio file but .tflite the transcription takes 34 seconds (and worse transcription)!

I am lost here! I thought .tflite was for faster results.

Check also our previous messages…where there is your example with .pb and .tflite model files’ overall proccessing time!

Am I wrong somewhere?

lissyx · December 21, 2019, 11:28am

Could you please give feedback on what I requested ? A bit worse transcription is expected, but I need to know if the issue we fixed improved things for you.

Transcription time of 34s seems high, but that also depends on the CPU itself. TensorFlow runtime leverages several CPUs, TFLite only uses one core at once.

lissyx · December 21, 2019, 11:40am

And I insist, given your report, and given what I have been able to reproduce and verify after the fix, it’s very very much likely the major source of discrepency is fixed.

George_Soloupis · December 21, 2019, 8:52pm

The transcriptions of the .wav files you provide are fine and accurate.
Now I am trying to have better results with my recordings. With your help and your comments there is a major cut of the background noise and I have now better transcriptions.

I just made the comment about the proccessing time that I find odd for the .tflite to be longer that the .pb file…and that is what makes me wondering if I will be able to build a project using the .pb file inside android…

I just cannot sleep thinking about that…

lissyx · December 22, 2019, 9:44am

Well, what’s the hardware ? You still have not replied to that.

George_Soloupis · December 22, 2019, 2:51pm

Here are the specs of colab cpu:

The thing is that with the same beam_width (500) transcription takes the same amount of time in my phone (8 cores with 4GB RAM). And also It uses the same cpu for .pb file

lissyx · December 22, 2019, 3:03pm

Not sure what you mean here: does it means you get the same, good execution time of tflite on your own CPU as well as on your phone, but it’s slower on Colab ?

I don’t know the details of Colab, there’s hardly anything we can do from here: who knows how the resources are shared amongts VMs.

lissyx · December 22, 2019, 3:05pm

Also, that’s not very descriptive. As documented, we only tested and verified faster than realtime on Snapdragon 820 (Sony Xperia Z) and 835 (Google Pixel 2), you might not get the same behavior on other hardware.

George_Soloupis · December 22, 2019, 3:13pm

Anyway I am happy we have solved the issue with 0.6.0 tflite file.
About 2 minutes ago I updated the dependency of the deepspeech module (android) to 0.6.1.alpha0. Also everything transcribes fine!
A 2 seconds file takes 1.5 seconds to transcribe…so if I have a loop before the creation of the new .wav file I have the transcription of the current!

Sweet!

lissyx · December 22, 2019, 3:12pm

You mean to 0.6.0 final ?

So does it means your issues are mostly solved and now you have an experience closer to what we expect ?

That does sounds good. TFLite runtime, I hope?

George_Soloupis · December 22, 2019, 3:17pm

Yes sorry…I meant 0.6.1.alpha0 (I edited my previous comment).

I also found inside native_client how I can build .so file without runtime=tflite…BUT I will not do that

Yes .tflite runtime …I will keep experimenting with good audio recordings.

I will keep monitoring next releases of yours.

Thanks

lissyx · December 22, 2019, 3:23pm

Thanks for taking the time to investigate and help us find that issue on the model.