Unable to install deepspeech on centos 6.9

Hello Lissyx,

I just followed the steps mentioned by you (now using .pbmm file with the new deepspeech binary obtained from native_client.tar.xz) but getting “File format ‘# Ea’… not understood” error as shown below-

(deepspeech-venv) [centerstage@localhost new_native_client]$ deepspeech output_graph.pbmm …/models/alphabet.txt …/hiroshima-1.wav
Loading model from file output_graph.pbmm
2018-03-05 11:37:53.392392: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Data loss: Can’t parse output_graph.pbmm as binary proto
Loaded model in 0.005s.
Traceback (most recent call last):
File “/home/centerstage/tmp/deepspeech-venv/bin/deepspeech”, line 11, in
sys.exit(main())
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/deepspeech/client.py”, line 66, in main
fs, audio = wav.read(args.audio)
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/scipy/io/wavfile.py”, line 236, in read
file_size, is_big_endian = _read_riff_chunk(fid)
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/scipy/io/wavfile.py”, line 168, in _read_riff_chunk
"understood.".format(repr(str1)))
ValueError: File format ‘# Ea’… not understood.

One more point that ‘-t’ argument is not supported here,

(deepspeech-venv) [centerstage@localhost new_native_client]$ deepspeech output_graph.pbmm …/models/alphabet.txt …/hiroshima-1.wav -t
usage: deepspeech [-h] model audio alphabet [lm] [trie]
deepspeech: error: unrecognized arguments: -t

Hello Lissyx,
Thanks for your work!

I am able to install deepspeech on centos 7.3.1611 and I am able to perform the Speech to Text conversion of a .wav audio file.

The point I am concerned right now about is the high inference time (Speech to Text conversion time) which I need to reduce.

Is the fix that you are providing [Python 2.7 unicode deepspeech build] is by any mean associated to decrease the inference time? Should I install this new deepspeech version to full fill my purpose?

No, it’s only aimed at working. The -t is only available on the C++ client. Regarding the high memory usage, I still need you totest C++ client with -t to get more informations.

Hello Lissyx,

My deepspeech binary is getting abort when I invoked deepspeech of native_client-

(deepspeech-venv) [centerstage@localhost Speech_Recognizer]$ ./new_native_client/deepspeech new_native_client/output_graph.pbmm hiroshima-1.wav models/alphabet.txt models/trie -t
2018-03-05 16:26:40.885164: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Error: Alphabet size does not match loaded model: alphabet has size 497, but model has 28 classes in its output. Make sure you’re passing an alphabet file with the same size as the one used for training.
Loading the LM will be faster if you build a binary file.
Reading models/alphabet.txt
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
terminate called after throwing an instance of 'lm::FormatLoadException’
what(): native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector&) threw FormatLoadException.
first non-empty line was “a” not \data. Byte: 218
Aborted (core dumped)

Yes, because you have not read properly, and we changed the order of arguments. WAV file or directory should now be the last one, just before -t.

Yeah…that was a mistake.

please find below the output now-

(deepspeech-venv) [centerstage@localhost DeepSpeech]$ ./native_client/deepspeech …/models/output_graph.pb …/models/alphabet.txt …/models/lm.binary …/models/trie …/hiroshima-1.wav
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-03-05 16:47:52.381157: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

In sox_init()

In startRead

Searching for 66 6d 74 20

WAV Chunk fmt
Searching for 64 61 74 61

WAV Chunk LIST
WAV Chunk data
Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
32000 byte/sec, 2 block align, 16 bits/samp, 147840 data bytes
73920 Samps/chans
Searching for 4c 49 53 54
on a bright cloud less morning
(deepspeech-venv) [centerstage@localhost DeepSpeech]$

But then you are not passing the -t and there is extra output that is not from our codebase. Please stick to our code.

Please find the output with -t option

(deepspeech-venv) [centerstage@localhost DeepSpeech]$ ./native_client/deepspeech …/models/output_graph.pb …/models/alphabet.txt …/models/lm.binary …/models/trie …/hiroshima-1.wav -t
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-03-05 16:54:48.750644: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

In sox_init()

In startRead

Searching for 66 6d 74 20

WAV Chunk fmt
Searching for 64 61 74 61

WAV Chunk LIST
WAV Chunk data
Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
32000 byte/sec, 2 block align, 16 bits/samp, 147840 data bytes
73920 Samps/chans
Searching for 4c 49 53 54
on a bright cloud less morning
cpu_time_overall=51.56000 cpu_time_mfcc=0.01000 cpu_time_infer=51.55000
(deepspeech-venv) [centerstage@localhost DeepSpeech]$

I didn’t get what is the extra output path?

Hello Lissyx,

I am posting one more output which uses mmap and the native_client.tar.xz from https://tools.taskcluster.net/index/artifacts/project.deepspeech.deepspeech.native_client.master/cpu as per what suggested by you…

(deepspeech-venv) [centerstage@localhost Speech_Recognizer]$ ./new_native_client/deepspeech new_native_client/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie hiroshima-1.wav -t
2018-03-05 17:01:41.754827: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

In sox_init()

In startRead

Searching for 66 6d 74 20

WAV Chunk fmt
Searching for 64 61 74 61

WAV Chunk LIST
WAV Chunk data
Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
32000 byte/sec, 2 block align, 16 bits/samp, 147840 data bytes
73920 Samps/chans
Searching for 4c 49 53 54
on a bright cloud less morning
cpu_time_overall=45.09000 cpu_time_mfcc=0.01000 cpu_time_infer=45.08000
(deepspeech-venv) [centerstage@localhost Speech_Recognizer]$

And yet there is still a ton of output that is not from our codebase. I cannot trust those results :frowning:

Hello lissyx,

I followed https://github.com/mozilla/DeepSpeech for installation.

I have downloaded deepspeech using below-
git clone https://github.com/mozilla/DeepSpeech

Downloaded the model from below link-
https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz

Then created a virtual environment deepspeech-venv and installed deepspeech using-
'pip install --upgrade deepspeech’
and then installed the requirements present in requirements.txt file of Deepspeech directory.

Performed-
python util/taskcluster.py --target native_client/

And then as suggested by you I used the native_client.tar.xz from https://tools.taskcluster.net/index/artifacts/project.deepspeech.deepspeech.native_client.master/cpu

and for mmap performed below operation-
convert_graphdef_memmapped_format --in_graph=output_graph.pb --out_graph=output_graph.pbmm

Please let me know if I have done a mistake anywhere?

Where is this coming from ?

It might just be slow because your CPU / system is slow.

Hello Lissyx,

I am already using 12 CPU machine with CPU MHz: 1200.042
Can you please suggest me what is the required CPU/system configuration to work upon deepspeech?

Please understand that I have no idea why it is that slow on your system, and it might come from a lot of things. Is it bare-metal ? Are you the only user ? How much memory is available ? What is the exact storage specs ? Have you checked if you have multiple threads running on the deepspeech binary? htop -p <pid-of-deepspeech> should help for that.

You have not answered why there is this extra debug informations from SOX that I quoted earlier. I’m unsure of the exact code you are running right now, and the environment you are running it in. This might have a play as well.

https://www.cpubenchmark.net/compare.php?cmp[]=2275&cmp[]=2427

This gives rating between your CPU (Xeon E5-2609 v3) and mine (i7 4790K) :

|Single Thread Rating|2530|1115|
|CPU Mark|11189|5940|

So that’s in both case less than half of the performances. For an audio sample of 2.9secs, my CPU on the similar codebase would decode it in ~5secs. Your sample is 4.8secs, so a quick back-of-the-enveloppe computation would give something around 20 secs, baseline. Without further informations on your system specifications, and why there is this debug output, it’s hard to assess if there’s anything wrong here.