Android project with .pb files instead of .tflite

Hello forum and DeepSpeechers,

I have managed to build the android project and was working on some voice samples. I have seen that it has low quality results on the phone (with the use of .tflite) despite the fact that the same voice .wav files give “good” results in Colab with the loaded wheel file. In colab .pb or .pbmm files were used.

So my question is if there is an implementation where we can use directly the .pb file in the android phone without worrying of the transcription time??

Thanks in advance

No. That’s not possible.

Our tests revealed a slight but not that bad degradation with quantized TFLite, around +2% (from 8.22% to 10.1%).

Do you have more insights on the wav file you use ?

Thank you for the fast response.

I have checked the files with these online tools
metadata
spectrum
and they seem OK

The strange thing is that with the .pb files (in colab) the results are extremely good with the generated .wav files from my phone. Also with the .tflite file and the wheel file for .tflite in colab I get weird results.

I want to mention also that the .tflite file from version 0.6.0 is extremely bad and I worked with the file from previous version 0.6.0.alpha15!!!
I got these files from this page:
releases

I have build also project android_speech
and I saw that for online transcription the results are very good. I have seen that there is an .ogg/Opus encoding. I want to ask what file do you use for online transcription? .pb or .tflite??

Thank you

Please be more descriptive than “extremely good” and “weird”.
Also, how do you run the comparison ? We need much more details.

Can you explain a little bit more what you mean here ? What exactly gives you good results? And bad ones?

Please read the code, online transcription is not done through DeepSpeech.

I’m not sure what you exactly mean. There are a lot of variables, from a clean sound, to no noise, no transformation artifacts, your accent and your way of speaking, etc.

@lissyx sorry for the late response.

I can give you the transcriptions of my phone generated .wav file from colab and phone with .pb and .tflite files to clarify what is ‘weird’ and what is ‘very good’ if you want…

But first my main question…why transcription with .pb file is not possible?

Because:

  • it would not allow to run realtime
  • it is not supported by tensorflow, only tflite runtime works on Android, as much as I can tell

No sorry! This is not accurate: ‘it is not supported by tensorflow, only tflite runtime works on Android’
Check this link to see my project with Tensorflow’s Deeplabv3+ inside android where .pb file is used. Also my github account is full of android projects with .pb files. So this is possible.

‘it would not allow to run realtime’…If a whole picture can be converted and proccessed in less that 1 second inside nowadays phone then I believe a spectogram will be easy also.

So do you think I have to alter some Cpp files to load the .pb file inside android?

As much as I can tell, this was not working properly, being overly too slow, much more painful and fragile to cross-compile. TFLite runtime was much efficient.

Are we comparing the same complexity of models ?

You would need to rebuild quite a lot of things, and in unsupported ways … Android builds defined runtime=tflite: https://github.com/mozilla/DeepSpeech/blob/master/native_client/BUILD#L14-L19
So if follow the docs and don’t pass --define=runtime=tflite then it should try to build TensorFlow runtime instead. But I won’t have time to help you getting that built … It’d be much more productive that we identify what gets bad in your case. Because that does not reflect our experience …

Now, can you please answer all my previous questions ? I still don’t get what you meant with your references to 0.6.0-alpha.15 ? Was it the model ? the lib ? both ?

@lissyx Good morning from Greece.

So let’s get started.
From this page I downloaded 0.6.0 and 0.6.0.alpha15 zip files.

I also downloaded audio files that you provide here

I used for example audio file that one lady says “Why should one halt on the way”.

Below are the transcriptions:

0.6.0

0.6.0_alpha15

If you want any other info I am glad to provide you!

And I will come back later to clarify what you have told me about web and DeepSpeech

And screenshot from here where I used 0.6.0.tar.gz file with models

So, those are not really intended for general use, I’ve put them online so that people hacking on androidspeech / mozillaspeechlibrary and a few other project can use them.

That’s just one example … I’m not sure it is really enough to draw any conclusion.

1 Like

Thank you for your replies @lissyx .

So please why you told me DeepSpeech is not used at online transcriptions? From code I see that Bytearrayoutputstream with some tags are passed to this endpoint “https://speaktome-2.services.mozilla.com/” and the response contais the transcription.

So what do you use for online transcription? Some other model?

Because that’s the truth ?

Speak to me infrastructure has several implementations, the default one being not DeepSpeech. i’m not in charge of the whole infra, so I can’t give more details, but I think it’s Google STT by default.

1 Like

You are great! Thank you again!

Hope to talk back in the future.

Have a nice day!

I’m not the one who did the release, so I don’t know exactly how the export was done. Maybe @reuben could give more details, but v0.6.0-alpha.15 was just a re-export of v0.5.1 checkpoint, and we did changes to the graph that should have made v0.6.0 much better than v0.6.0-alpha.15.

Though, @reuben is on PTO until next year, so don’t expect news soonish.

Please note that we are looking into switching to TFLite for all runtimes, so getting it to work well is important.

1 Like

There’s definitively something going on here …

$ ~/tmp/deepspeech/0.6.0/tfpb/deepspeech --model ~/tmp/deepspeech/0.6.0/eng/deepspeech-0.6.0-models/output_graph.pbmm --lm limited_lm.binary --trie limited_lm.trie --audio deepspeech_dump_all.wav -t
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.0-0-g6d43e21
2019-12-19 16:08:41.433573: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
turn the bedroom lamp light on
cpu_time_overall=7.25584
$ ~/tmp/deepspeech/0.6.0/tflite/deepspeech --model ~/tmp/deepspeech/0.6.0/eng/deepspeech-0.6.0-models/output_graph.tflite --lm limited_lm.binary --trie limited_lm.trie --audio deepspeech_dump_all.wav -t
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.0-0-g6d43e21
INFO: Initialized TensorFlow Lite runtime.
on the bedroom light on
cpu_time_overall=12.07772

And without LM:

$ ~/tmp/deepspeech/0.6.0/tflite/deepspeech --model ~/tmp/deepspeech/0.6.0/eng/deepspeech-0.6.0-models/output_graph.tflite --audio deepspeech_dump_all.wav -t
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.0-0-g6d43e21
INFO: Initialized TensorFlow Lite runtime.
no the bederorond lighte pon
cpu_time_overall=12.68055
$ ~/tmp/deepspeech/0.6.0/tfpb/deepspeech --model ~/tmp/deepspeech/0.6.0/eng/deepspeech-0.6.0-models/output_graph.pbmm --audio deepspeech_dump_all.wav -t
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.0-0-g6d43e21
2019-12-19 16:11:19.134152: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
 ter de bedroom along light on
cpu_time_overall=8.49712
1 Like

Yes I saw that on my results…

I ended up using 0.6.0.apha15 .tflite file.

I hope I helped somehow with my questions

Well, at least it helps knowing others are experiencing similar behavior.

Out of a hack, what does setting 2.0 as LM alpha parameter yields, on your side ?

@George_Soloupis Looks like the issue is identified and a fix ready.

1 Like