sehar@sehar-HP-Z220-CMT-Workstation:~/DeepSpeech/examples/mic_vad_streaming$ python mic_vad_streaming.py --model $HOME/sehar/urdu-models/output_graph.pb --alphabet $HOME/sehar/urdu-models/alphabet.txt --lm $HOME/sehar/urdu-models/lm.binary --trie $HOME/sehar/urdu-models/trie --file $HOME/sehar/urdumodels/sent6urd.wav
Initializing model…
INFO:root:ARGS.model: /home/sehar/sehar/urdu-models/output_graph.pb
INFO:root:ARGS.alphabet: /home/sehar/sehar/urdu-models/alphabet.txt
Traceback (most recent call last):
File “mic_vad_streaming.py”, line 242, in
main(ARGS)
File “mic_vad_streaming.py”, line 167, in main
model = deepspeech.Model(ARGS.model, ARGS.alphabet, ARGS.beam_width)
File “/home/sehar/.local/lib/python2.7/site-packages/deepspeech/init.py”, line 40, in init
status, impl = deepspeech.impl.CreateModel(*args, **kwargs)
TypeError: CreateModel() takes at most 2 arguments (3 given)
kindly help me in resolving this eroor
I am not getting you
You are running code against incompatible version. Check the code and the API. If you need more help, share context on what you followed to setup.
import wave
import deepspeech as ds
import numpy as np
import IPython.display
data_loc="./maybenexttime.wav"
model_loc = “…/models/output_graph.pb”
trie_loc = “…/models/trie”
lm_loc = “…/models/lm.binary”
alphabet_loc ="…/models/alphabet.txt"
BEAM_WIDTH = 100
LM_wEIGHT = 1.50
VALID_WORD_COUNT_WEIGHT = 2.25
N_FEATURES = 16
N_CONTEXT = 9
def IntializeModel():
print(‘initializing model…’)
model = ds.Model(model_loc,N_FEATURES,N_CONTEXT,alphabet_loc,BEAM_WIDTH)
print(‘Intilizing LM model’)
model.enableDecoderWithLM(alphabet_loc,lm_loc,trie_loc,LM_WEIGHT,VALID_WORD_COUNT_WEIGHT)
return model
model=IntializeModel()
def ReadAudioData(data_loc):
audio_data = wave.open(data_loc,“rb”)
rate = audio.data.getframerate()
audio = np.frombuffer(audio_data,readframe(audio_data.getnframes()),np.int16)
audio_data.close()
print(‘the audio rate is %d’ % (rate))
return audio,rate
audio,speech_rate = ReadAudioData(data_loc)
model = ds.Model(model_loc,N_FEATURES,N_CONTEXT,alphabet_loc,BEAM_WIDTH)
“/home/aniksaha.anacinda3/lib/python3.7/site-packages/deepspeech/init.py”,line 40, in init
status,impl = deepspeech.impl.CreateModel(*args,**kwargs)
TypeError: CreateModel() takes at most 2 arguments(5 given)
how can i fix it?? please help me.
Use the matching version or update your code: https://deepspeech.readthedocs.io/en/v0.6.0/Python-API.html
import deepspeech
from deepspeech import Model
import os
deepspeech = Model(’/home/junaid/new_mic/deepspeech-0.6.0-models/output_graph.pb’ ,500)
import librosa
audio_data, audio_rate = librosa.load(’/home/junaid/new_mic/DeepSpeech/data/testdata/1.wav’, sr=None)
processed_data = deepspeech.stt(audio_data,audio_rate)
print(processed_data)
File “/home/junaid/tmp/newmic-venv/lib/python3.7/site-packages/deepspeech/init.py”, line 93, in stt
return deepspeech.impl.SpeechToText(self._impl, *args, **kwargs)
TypeError: SpeechToText() takes at most 2 arguments (3 given)
I am facing this issue even using the updated version, please help me out.
stt no longer takes a sample rate parameter. https://deepspeech.readthedocs.io/en/v0.6.0/Python-API.html#native_client.python.Model.stt
Thanks Reuben, so you mean we need to pass only one argument and that is audio data?
Correct. And the audio data should match the sample rate of the model you’re using. (16kHz for our release models).
yes all my audio files are 16kHz mono only,
but now while doing deepspeech.stt(audio_data), i got the following error:
TypeError: Cannot cast array data from dtype(‘float32’) to dtype(‘int16’) according to the rule ‘safe’
We expect int16 data, not float32. I don’t know if you can coax librosa into doing the right thing. You can either convert manually or use scipy.io.wavfile.read
which returns the dtype matching the input WAV file.
Thanks reuben, that was the issue , scipy.io.wavfile.read resolved my issue.