What prediction information is available from deepspeech inference?

r-wei · February 4, 2018, 9:18pm

Hi, I’m using the deepspeech python package to make a speech-to-text inference on a wav file. I see some output like this:

Loading model from file models/output_graph.pb
Loaded model in 0.249s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 1.428s.
Running inference.
yes 
Inference took 9.582s for 5.000s audio file.

Does this mean that the model predicted that my wav file contained the word “yes”? Is there an estimated confidence/accuracy score on this prediction? Were any other files with prediction information created?

Is it possible to get timestamps on the predicted text? For example, the model predicted that the wav file contained someone saying the word “yes” starting at 1.000sec and ending at 1.010 sec.

r-wei · February 15, 2018, 7:26pm

Re: timestamps – Appears the conversation is being held here: https://github.com/mozilla/DeepSpeech/issues/1125

Re: confidence scores – https://github.com/mozilla/DeepSpeech/issues/900

kdavis · February 16, 2018, 6:19am

Currently there is no way to produce timestamps. Adding such would require a bit of work and will likely, if it happens, be part of a 0.3.0, or later, release.

yv001 · February 16, 2018, 8:37am

" if it happens" - do you mean if release 0.3.0 happens or if the feature is included in the release?

kdavis · February 16, 2018, 8:53am

Sorry. 0.3.0 will happen. What I mean is if the feature is included in the 0.3.0 release.

r-wei · February 16, 2018, 3:40pm

Cool, thanks for pointing me to the release projects, @kdavis. I’m interested in this feature and will have some time extra time over the next month. My background is more academic (understanding research) than building production code, but I have contributed to open source projects before. Is this github issue a good place to share learnings and collaborate?

lissyx · February 16, 2018, 3:59pm

I guess that collaborating on fixing the issue should happen on that issue, yes. And opiniated discussions on the issue itself makes sense there