Implementation of arXiv:1603.03185 “Personalized Speech recognition on mobile devices” as part of pipsqueak repo.
Is there any work going on this area or has Mozilla completely moved away from this to Baidu’s DeepSpeech?
Implementation of arXiv:1603.03185 “Personalized Speech recognition on mobile devices” as part of pipsqueak repo.
Is there any work going on this area or has Mozilla completely moved away from this to Baidu’s DeepSpeech?
We are still working on that, check the issues on GitHub, I’ve got pending PR with TensorFlow Lite working on device (Pixel 2).
The README in that repo now says pipsqueak is part of the DeepSpeech repo. It is a using the same codebase and model now as ‘server’ DeepSpeech or part of the native client?
I’m not sure what you mean, “pipsqueak” is basically the model exported as TFLite, and libdeepspeech.so
including runtime for TFLite on (some) platforms.
Thanks. I wondered whether pipsqueak was a model different from DeepSpeech, specifically tuned for low resource systems, with possible disadvantages, but it seems that’s not the case.
We aren’t using the pipsqueak repo. All such small platform work is currently happening in the Deep Speech repo. (I just deleted the pipsqueak repo.)
At https://research.mozilla.org/machine-learning/
Visit Mozilla’s GitHub
Read the GitHub wiki
Pipsqueak EngineOnline STT technologies can have security and privacy vulnerabilities. Mozilla researchers aim to create a competitive offline STT engine called Pipsqueak that promotes security and privacy. This implementation of a deep learning STT engine can be run on a machine as small as a Raspberry Pi 3. Our goal is to disrupt the existing trend in STT that favors a few commercial companies, and to stay true to our mission of making safe, open, affordable technologies available to anyone who wants to use them.
Referred back to here at https://github.com/mozilla/DeepSpeech/issues/2401 .
Am interested in the current status of the former intent. Specifically the ability to use Pip Squeak Engine as the service for Web Speech API, to avoid the issue of sending audio to a remote web service w3c/speech-api#56.
How far (in estimated time) are we away from being able to use Pip Squeak or DeepSearch as the local service to use for STT?
I happened to search STT and found the same link of Mozilla which mentioned about Pipsqueak (and it’s exciting!). But I cannot find further information and noticed that there’s nothing of Pipsqueak can be found in the DeepSpeech repo. The information so far confuses me a lot. Does anyone know the status of Pipsqueak?
If you read the documentation and the releases notes, you can see we run on Android devices and RPi4 realtime now.
Read and re-reading the documentation, and user posts http://www.michaelvenz.com/2018/10/06/mozilla-deepspeech-on-ubuntu-18-04/ ; https://medium.com/@aadrkirandevraj/installing-and-running-pre-trained-deepspeech-model-a431f94f52d3, . Downloaded DeepSpeech and have tried several times to run the application. pip3 install deepspeech
alone outputs error relevant to version missing. Tried specifying version and a series of errors were printed to stdout.
Meaning how to get this running at 32-bit desktop with Intel processor is not immediately clear.
Will next try
Does not have to be a “real-time” implementation. Can convert input MediaStreamTrack
to a .wav
file to provide input.
We can’t help with vague “it’s not working”.
No work has been done on that, this is not ours.
No idea what that is.
What are you refering to? MediaStreamTrack
looks like DOM-level WebAudio.
Re-reading realease notes of DeepSpeech 0.5.1 does in fact state the supported platforms. *nix 32 bit is apparently not supported. Unfortunate. STT at the desktop locally (potentially on legacy systems) is a requirement https://webwewant.fyi/wants/55/ that am trying to meet on own
Followed the instructions. deepspeech
is not being compiled. Not sure what to state? Can step-by-step instructions be posted for *nix 32-bit architecture, or clearly stated such platform and archicture are not supported whatsoever?
MediaStreamTrack
is described in Media Capture and Streams spaecification (W3C).
That is not an issue. Can communicate with the native file system using a variety of means, including Native Messaging and WebSocket
.
Consider https://github.com/w3c/speech-api/issues/66. Am in the process of creating a proof-of-concept demonstrating passing a MediaStreamTrack
(audio), data URI
, ArrayBuffer
, Float32Array
to a local function which executes a STT binary or series of binaries on the client - without any external resources involved.
Already created several proof-of-concepts for TTS using espeak
, espeak-ng
, Native Messaging and WebSocket
, and Native File System.
The goal is to create JavaScript functions which execute local binaries to achieve TTS and SST commenced by browser code completely locally.
No, sorry. Mostly all distributions are stopping supporting 32-bits, and upstream tensorflow does not, so we cannot do it.
Please have a look at Bugzilla https://bugzilla.mozilla.org/show_bug.cgi?id=1248897 and friends
We state that we support linux x86 64 bits, I don’t see the point of stating we don’t support arch x y and z, there are too many we don’t support.
It looks like you are already in contact with colleagues working to expose WebSpeech in Firefox. You could see https://bugzilla.mozilla.org/show_bug.cgi?id=1474084 and https://github.com/lissyx/mozilla-central/commits/libdeepspeech_thread
Please be aware this is just a WIP hack. I urge you not to spread that.
Ubuntu still releases 32-bit distributions. “Chromium team” PPA supports 32-bit versions of Chromium dev version. Am running Nightly 32-bit right now.
From perspective here, until completely obsolete 32-bit programs should be supported. (Do not burn books because the Web exists).
The concept is to implement the code by “any means” at this point. As far as “spread” a working “hack” is “better” than no code to use at all. AFAICT do not have any “followers” in that sense. Work alone and publish own workarounds.
Again, it is unfortunate this project is not being concurrently implemented as the code shipped with the browser as the local service for Web Speech API.
Will take some time to read the links that you have posted.
TensorFlow does not, and since we rely on that, and we don’t have time to fix the world, we don’t.
This is a WIP hack. It is not meant for anything more than exercising the WebSpeech API and improving it and DeepSpeech API. The current Firefox media code that it interacts with is expected to have a huge refactoring.
Nothing is “alone” here, it’s just not ready for being more broadly communicated, and that being ready is not under my responsability.
Have you read my code and the bug ? This is explicitely a WIP of local WebSpeech API implementation backed by DeepSpeech. It’s still hack because a lot of other refactoring needs to be done prior to this work, and this refactoring still has not landed.
Not entirely, yet, no. Which specific link are you referring to?
Note, am very willing to help this move along where able. E.g., testing at the front-end.
This https://mdn.github.io/web-speech-api/speech-color-changer/ throws an error
InvalidStateError: An attempt was made to use an object that is not, or is no longer, usable
at Nightly 71.0a1 (2019-10-05) (32-bit)
29 recognition.start();
I don’t think there’s anything actionable here where you can be of help, sadly. You need to use a build from the branch I gave earlier. Work to integrate DeepSpeech as a WebSpeech API backend is still a long way. We are working on that, nothing is done behind closed doors, as you can see on the bugzilla links I shared.