Real-time DeepSpeech Analysis using built-in microphone

duys · July 17, 2019, 5:15am

Hello,

I am not sure how to properly contribute this knowledge to GitHub. I know on the FAQs there is a section that addresses that people would like to see if DeepSpeech can be used without having to save audio as a .wav file.

Well, in a nutshell (and according to client.py) the Model just needs the audio source to be a flattened Numpy Array. Another python package called SpeechRecognition, has built in support to create, in-memory, an audioData object that is acquired by some audio source (microphone, .wav file, etc…)

Anyways long story short, here is the code that I can run and it allows me to use DeepSpeech without have to create a .wav file. Also this assumes you have a built and trained model. For this piece of code I just used the pre-built binaries that were included.

Anyways, I hope this can be implemented officially into the project.


from deepspeech import Model
import numpy as np
import speech_recognition as sr


sample_rate = 16000
beam_width = 500
lm_alpha = 0.75
lm_beta = 1.85
n_features = 26
n_context = 9

model_name = "output_graph.pbmm"
alphabet = "alphabet.txt"
langauage_model = "lm.binary"
trie = "trie"
audio_file = "demo.wav"


if __name__ == '__main__':
    ds = Model(model_name, n_features, n_context, alphabet, beam_width)
    ds.enableDecoderWithLM(alphabet, langauage_model, trie, lm_alpha, lm_beta)

    r = sr.Recognizer()
    with sr.Microphone(sample_rate=sample_rate) as source:
        print("Say Something")
        audio = r.listen(source)
        fs = audio.sample_rate
        audio = np.frombuffer(audio.frame_data, np.int16)



    #fin = wave.open(audio_file, 'rb')
    #fs = fin.getframerate()
    #print("Framerate: ", fs)

    #audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)

    #audio_length = fin.getnframes() * (1/sample_rate)
    #fin.close()

    print("Infering {} file".format(audio_file))

    print(ds.stt(audio, fs))

lissyx · July 17, 2019, 7:13am

Thanks for sharing that @duys, we already have some contributed examples similar in https://github.com/mozilla/DeepSpeech/tree/master/examples/ maybe you can get more inspiration and/or send a PR to add yours ?

alchemi5t · July 22, 2019, 4:42am

@duys
Hey there!
I can help you getting your PR up. But the thing is, I think the streaming examples are working well anyway. Could you point out to why this might be better or even why someone would use this example over any of the others?

bilal.iqbal · October 8, 2019, 10:39am

As a beginner myself, I think this example is a good starting point. The existing example is in much greater detail and could be confusing as to what exactly is happening.

sehar_capricon · November 6, 2019, 7:21am

I am not getting exactly where do i have to run this code in do i have to write this code in DeepSpeech or in any other directory

lissyx · November 6, 2019, 11:24am

There should be a README explaining. You can also look at the shell scripts that we use to run tests …

sehar_capricon · November 11, 2019, 9:25am

i am running the above code for my own model and i am getting the following error
sehar@sehar-HP-Z220-CMT-Workstation:~/DeepSpeech./mic.sh
from: can’t read /var/mail/deepspeech
import-im6.q16: not authorized np' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized sr’ @ error/constitute.c/WriteImage/1037.
./mic.sh: line 22: syntax error near unexpected token (' ./mic.sh: line 22: ds = Model(model_name, n_features, n_context, alphabet, beam_width)’

lissyx · November 11, 2019, 1:39pm

What’s mic.sh ? By code above, is this the code pasted in the first post ? You should just use examples from the git repo, not this one.

And you error is unreadable, please properly copy/paste and use code-formatting …

sehar_capricon · November 12, 2019, 3:46am

#!/usr/bin/env bash
from deepspeech import Models
import numpy as np
import speech_recognition as sr

sample_rate=16000
beam_width=500
lm_alpha=0.75
lm_beta=1.85
n_features=26
n_context=9

model_name=“home/sehar/urdu-models/output_graph.pb”
alphabet=“home/sehar/urdu-models/alphabet.txt”
langauage_model=“home/sehar/urdu-models/lm.binary”
trie=“home/sehar/urdu-models/trie”
audio_file=“home/sehar/urdu-models/sent6urd.wav”

if name == ‘main’:
ds = Model(model_name, n_features, n_context, alphabet, beam_width)
ds.enableDecoderWithLM(alphabet, langauage_model, trie, lm_alpha, lm_beta)

r = sr.Recognizer()
with sr.Microphone(sample_rate=sample_rate) as source:
    print("Say Something")
    audio = r.listen(source)
    fs = audio.sample_rate
    audio = np.frombuffer(audio.frame_data, np.int16)



#fin = wave.open(audio_file, 'rb')
#fs = fin.getframerate()
#print("Framerate: ", fs)

#audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)

#audio_length = fin.getnframes() * (1/sample_rate)
#fin.close()

print("Infering {} file".format(audio_file))

print(ds.stt(audio, fs))

sehar_capricon · November 12, 2019, 3:48am

This is my code for using microphone as input for my own trained model. can I used this code while I was running this file I got this above error

lissyx · November 12, 2019, 9:32am

Ok, you need to learn some Python before using DeepSpeech, I fear. You are pasting Python code and running that with Bash. There’s no way this can work.

sehar_capricon · November 13, 2019, 8:25am

Traceback (most recent call last):
File “mic.py”, line 21, in
ds = Model(model_name, n_features, n_context, alphabet, beam_width)
File “/home/sehar/.local/lib/python2.7/site-packages/deepspeech/init.py”, line 40, in init
status, impl = deepspeech.impl.CreateModel(*args, **kwargs)
TypeError: CreateModel() takes at most 2 arguments (5 given)

sehar_capricon · November 13, 2019, 8:26am

now i have ran this file as python file and i am getting this above error kindly help

carlfm01 · November 13, 2019, 8:40am

Please use proper code format (try the forum toolbox and the preview)

Looks like a version mismatch, please make sure that you are using the same version tags for the client and model.

sehar_capricon · November 13, 2019, 8:48am

version mismatch of tensorflow??

lissyx · November 13, 2019, 10:40am

Of the deepspeech Python wheel. The code your wrote does not use the same API as the module you have installed …

lissyx · November 13, 2019, 10:42am

Ok, @sehar_capricon, you seriously need to make an effort on your end and read and do what we are instructing you to help you. Please read the code of the examples in the git repo, the link and the instructions were already shared to you earlier. We are welcoming newcomers, but we cannot do this work for you. If you refuses to make any effort, we won’t be able to help you.

mbonsign · November 18, 2019, 9:22pm

I don’t think you are using deepspeech here. I believe you are simply using speech_recognition’s default STT. Without deepspeech, if you just install pyaudio and SpeechRegognition you can type python -m speech_recognition, and it will work without pointing to a STT engine.

sehar_capricon · January 2, 2020, 8:43am

whats the difference between output_graph.pb and output_graph.pbmm

lissyx · January 2, 2020, 9:50am

If you read the documentation, you will learn that it is a protocobuffer modification to make the file mmap()able.