Native client not returning output

william.van.woensel · December 14, 2018, 7:15pm

I trained a very basic model based on a single WAV file of ca. 2m, using the same single file for training, validating and testing (with 1 epoch):

python3 ./DeepSpeech.py --train_files ../data/Voice_180207/train.csv --dev_files ../data/Voice_180207/train.csv --test_files ../data/Voice_180207/train.csv --epoch 1 --export_dir ../models/Voice_180207

After building the native client, I tried running it using the following code:

ARGS="–model …/…/models/Voice_180207/output_graph.pb --alphabet …/…/models/Voice_180207/alphabet.txt --audio …/…/…/audio/Voice_180207_1.wav" make run`

This returns the following:

LD_LIBRARY_PATH=/home/william/speech/deepspeech/tensorflow/bazel-bin/native_client: ./deepspeech --model ../../models/Voice_180207/output_graph.pb --alphabet ../../models/Voice_180207/alphabet.txt --audio ../../../audio/Voice_180207_1.wav
TensorFlow: b'v1.12.0-rc0-1797-g059c37c22c'
DeepSpeech: v0.4.0-alpha.1-12-gf69db72
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-12-14 14:59:51.008355: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

It seems that the script returns with exit code 0, however:

echo $?
0

Does the problem perhaps lie with the simplicity of the trained model? … I just wanted to try a baseline where a model is trained and tested on the same WAV file (mostly, to test the training part of DeepSpeech) - clearly the performance will not be realistic.

lissyx · December 14, 2018, 8:19pm

Yeah, one wav sample, default with of 2048 and only one epoch, you are right, it’s trained and running but it’s just unable to learn anything so the output is an empty string.

You should try with bin/run-ldc93s1.sh for that kind of testing, it’s designed to verify with a single file overfitting.

william.van.woensel · December 14, 2018, 9:43pm

Thanks. How would you propose dealing with a total audio set of ca. 20min … I would think using less than the standard 70 epochs?

lissyx · December 15, 2018, 9:42am

You might want to play with hyperparameters, and for sure, you want --n_hidden with a much smaller value than 2048

lissyx · December 15, 2018, 10:54am

@william.van.woensel Also, with such amount of data, you might be a good fit for transfer learning. I’ll defer to @josh_meyer for the specifics

william.van.woensel · December 15, 2018, 8:32pm

Thanks for your feedback.

It would be ideal if I could use a pre-trained model - but our audio contains quite a few medical terms that, using the available pre-trained model, are not being transcribed properly.

Training our own model using an existing medical corpus would be an option as well. But our own set of audio files is way too small and I haven’t had much luck finding existing ones.

Is the current pre-trained model based on Common Voice? Does there happen to be domain-specific subsets of the CV corpus?

jahir · December 16, 2018, 11:08am

You can train using the released checkpoints using your own data (I think that’s what lissyx meant).

william.van.woensel · December 16, 2018, 7:30pm

Thanks for pointing me towards bin/run-ldc93s1.sh. Just out of curiosity, how was the value for n_hidden obtained (it’s quite specific, i.e., 494)?

lissyx · December 17, 2018, 8:09am

We use some common voice data, regarding domain specific datasets, I don’t think so. What you can do, however, is train from the english model, if you have a few hours of specific data, and then make a more specialized language model: that should help a lot in your case.

We’re working on API changes to allow using multiple languages models, so in your case you could build one from medical terms.

william.van.woensel · December 17, 2018, 1:46pm

With the talk about language models I found this Discourse post and this blog post.

Am I right in assuming that, by creating a language model, one could improve accuracy without having to train your own network? In that case, I’m quite excited. I will try the steps outlined in the Discourse post.

lissyx · December 17, 2018, 2:15pm

You should follow the steps documented in the repo, under data/lm/README.md

william.van.woensel · December 17, 2018, 5:38pm

Thanks for all your help. I’ve generated a language model but I cannot build generate_trie - I successfully built the native client using make deepspeech but this doesn’t create a generate_trie executable.

Running make generate_trie gives the following:

c++     -Wl,--no-as-needed -Wl,-rpath,\$ORIGIN -L/home/william/speech/deepspeech/tensorflow/bazel-bin/native_client  -ldeepspeech   generate_trie.cpp   -o generate_trie
In file included from generate_trie.cpp:5:0:
ctcdecode/scorer.h:9:10: fatal error: lm/enumerate_vocab.hh: No such file or directory
 #include "lm/enumerate_vocab.hh"
          ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
<builtin>: recipe for target 'generate_trie' failed
make: *** [generate_trie] Error 1

Did I miss something? …

lissyx · December 17, 2018, 5:41pm

Like reading the documentation ? Fetching generate_trie from our prebuilt native_client.tar.xz ?

There’s no generate_trie target in the Makefile, that part depends on TensorFlow and thus Bazel.

william.van.woensel · December 17, 2018, 5:51pm

I already said that I built native client successfully, which is the only thing discussed in the documentation. Nothing else is discussed there - certainly nothing about fetching generate-trie from native_client.tar.xz.

lissyx · December 17, 2018, 5:58pm

There’s generate_trie building references in native_client/README.md. Sorry if it’s not clear enough, PRs to improve the docs are welcome.

We also had, I’m pretty sure, docs covering the fact that generate_trie is bundled in native_client.tar.xz, but I cannot find that anymore. Not sure what happened. So just pick it, and that should be fine: https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu/artifacts/public/native_client.tar.xz or use the documented util/taskcluster.py

william.van.woensel · December 17, 2018, 6:04pm

yes, thanks for pointing it out. I just downloaded the pre-built binaries and sure enough it’s there. I had built the native_client binaries locally, using the bazel command from the docs, but perhaps something went wrong there … I dunno.

lissyx · December 17, 2018, 6:07pm

Right, make sure your command line did include //native_client:generate_trie, and you should have the binary in TensorFlow’s bazel-bin/native_client/

william.van.woensel · December 17, 2018, 6:34pm

thanks, the binaries were indeed under bazel-bin/native_client/ … I assumed they would have appeared under DeepSpeech/native_client … (maybe not a bad thing to add to the docs for those of us who are unacquainted with Bazel)

lissyx · December 17, 2018, 6:53pm

PRs are welcome, if it’s not documented it’s likely we have not seen that as a pain point, and we might need help to properly explain it

william.van.woensel · January 3, 2019, 3:03pm

ok, just submitted a minor PR for this issue