I trained a very basic model based on a single WAV file of ca. 2m, using the same single file for training, validating and testing (with 1 epoch):
python3 ./DeepSpeech.py --train_files ../data/Voice_180207/train.csv --dev_files ../data/Voice_180207/train.csv --test_files ../data/Voice_180207/train.csv --epoch 1 --export_dir ../models/Voice_180207
After building the native client, I tried running it using the following code:
ARGS="–model …/…/models/Voice_180207/output_graph.pb --alphabet …/…/models/Voice_180207/alphabet.txt --audio …/…/…/audio/Voice_180207_1.wav" make run`
This returns the following:
LD_LIBRARY_PATH=/home/william/speech/deepspeech/tensorflow/bazel-bin/native_client: ./deepspeech --model ../../models/Voice_180207/output_graph.pb --alphabet ../../models/Voice_180207/alphabet.txt --audio ../../../audio/Voice_180207_1.wav
TensorFlow: b'v1.12.0-rc0-1797-g059c37c22c'
DeepSpeech: v0.4.0-alpha.1-12-gf69db72
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-12-14 14:59:51.008355: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
It seems that the script returns with exit code 0, however:
echo $?
0
Does the problem perhaps lie with the simplicity of the trained model? … I just wanted to try a baseline where a model is trained and tested on the same WAV file (mostly, to test the training part of DeepSpeech) - clearly the performance will not be realistic.