Offline speech recognition on mobile

Allenlee · December 4, 2017, 5:44pm

Can DeepSpeech let me implement local, offline speech recognition on mobile?

reuben · December 6, 2017, 1:11pm

Right now, you could do it on a high end phone, but it would be slow. We haven’t yet created models optimized for inference on mobile devices, but it’s on the roadmap.

madbilly · March 25, 2018, 9:13pm

I was just wondering how to use Mozilla Deep Speech in Android instead of the Google Voice service. I guess it’s not possible yet? What’s the roadmap, roughly? How can someone with little coding experience help?
Cheers

tonytopper · December 5, 2019, 10:32pm

Any update on getting Deep Speech on to an iOS device?

reuben · December 5, 2019, 11:34pm

There’s been progress, in that the model is actually convertible to CoreML now: https://github.com/mozilla/DeepSpeech/issues/642 and https://github.com/tf-coreml/tf-coreml/issues/309

Next steps would be:

Adding a class that implements the ModelState API using CoreML, similar to how we currently have TFModelState and TFLiteModelState implementations.
Figuring out how to compute features, as I’ve had to remove the feature computation sub-graph to get the CoreML conversion to finish. I don’t think the AudioSpectrogram/MFCC ops are supported in CoreML. I’d start by simply vendoring TensorFlow’s kernels and building those into libdeepspeech.so. We could even use this work in all model types, to reduce overhead.
Figuring out packaging for iOS. Nobody on our team has iOS experience so I don’t have any suggestions for this. Basically, make it possible to build a DeepSpeech package with the format used for iOS dependency management.

reuben · December 5, 2019, 11:33pm

Step 3 would possibly also involve adding Swift bindings to the C API.