Hi everyone,
I run some tests with the Deepspeech Python Package. The model used is the pre-trained model which we can download and I test its efficiency on the a small part of the Voxforge Dataset. Here is the code I used:
import os
from jiwer import wer
from deepspeech import Model
import scipy.io.wavfile as wav
def file_lengthy(fname):
with open(fname) as f:
for i, l in enumerate(f):
pass
return i + 1
directory = 'Voxforge_dataset/test'
ds = Model('models/output_graph.pb', 26, 9, 'models/alphabet.txt', 500)
score = 0
num1 = 0
for foldername in os.listdir(directory):
direc = directory + '/' + foldername
if 'wav' in os.listdir(direc):
direc1 = directory + '/' + foldername + '/wav'
direc2 = directory + '/' + foldername + '/etc/prompts-original'
f = open(direc2, "r")
text = f.readlines()
lengthy = file_lengthy(direc2)
for k in range(lengthy):
line = text[k]
num2 = line.find(' ')
name = direc1 + '/' + text[k][:num2] + '.wav'
num1 = num1 + 1
fs, audio = wav.read(name)
processed_data = ds.stt(audio, fs)
score = score + wer(text[k][num2+1:], processed_data)
score = score / num1
print("Average wev is", score)
I obtained an average wer of approximately 35%. My question are the following:
- Is this wer normal, considering I used the pre-trained model ?
- I didn’t specified any language model binary file nor any language model trie file. So which ones did my algorithm use by default ?
- How can I specify which lm binary file and trie file I want to use ?
Thanks in advance.