I’ve been passively playing around with DeepSpeech and have gotten some simple examples working. What I would like is ‘real time’ speech-to-text conversion, preferably in an agnostic way (via a wav
stream, say) so that I can pump whatever input I want through DeepSpeech to convert on the fly.
I saw @reuben’s response on a HN thread about continuous streaming and output without any voice activity detection and was hoping to get more information on how to go about doing this.
I have successfully run the vad_transcriber by doing the following (I’ll be verbose in case anyone else comes along and wants to get the example running):
pip3 install -r requirements
python3 audioTranscript_cmd.py --model ./models/ --audio audio/2830-3980-0043.wav
Where the models
directory is a symbolic link to the deepspeech-0.6.0-models
models directory.
I have not been able to get the ffmpeg_vad_streaming
or mic_vad_streaming
examples working.
The ffmeg_vad_streaming
streaming example gives me the following error:
$ ffmpeg -version | head -n1
ffmpeg version 4.1.3-0york1~16.04 Copyright (c) 2000-2019 the FFmpeg developers
$ npm install
...
$ node ./index.js --model ./models/output_graph.pbmm --audio audio/2830-3980-0043.wav --lm models/lm.binary --trie ./models/trie
node: symbol lookup error: /home/abe/git/github/mozilla/DeepSpeech/examples/ffmpeg_vad_streaming/node_modules/deepspeech/lib/binding/v0.6.0/linux-x64/node-v64/deepspeech.node: undefined symbol: _ZNK2v86String10Utf8LengthEPNS_7IsolateE
Where, as above, the models
directory is a symlink to the deepspeech-0.6.0-models
directory.
I have a hard time figuring how to use mic_vad_streaming
properly.
Sorry for the long message, I’m unfamiliar with DeepSpeech, Tensorflow and don’t have deep knowledge of Python. Any help would be appreciated in pointing me in the right direction.