Custom lm binary for digits and few set of commands

dhanesh · January 6, 2020, 6:59am

Hi, I am experimenting with a small custom lm which mostly has digits combination (all digit combinations should be recognized) and few set of non-digit words sentences. (e.g. “all is good”). Both types never occurring together in a sentence. now, lm binaries and trie generated by this vocabulary works fine for non-digits sentences with default tflite model provided for v0.5.1. For digit combinations, I observed that sequences occurring in vocabulary are recognized with high probability, compared to digit sentences not in vocabulary (e.g. “five seven five nine”). Am I missing something here?

Sharing arpa and corresponding lm binary file and trie file.
all_combinations.zip (93.8 KB)

Command used:

~/terminal/kenlm/build/bin/lmplz --text vocabulary.txt --arpa words.arpa --order 5 --discount_fallback --temp_prefix /tmp/

Generate lm

~/terminal/kenlm/build/bin/build_binary -T -s trie words.arpa lm.binary

Generate trie

~/terminal/repository/DeepSpeech/generate_trie alphabet.txt lm.binary trie

lissyx · January 6, 2020, 9:16am

Are you working with v0.5.1 or are you comparing your results to 0.5.1 ?

dhanesh · January 8, 2020, 9:43am

Sorry for late reply.

I’m working with v0.5.1. I did source build to get deepspeech executable.
Here is the output I get:

$ ./deepspeech --model data/all_combinations/output_graph.pbmm -t --extended --alphabet data/alphabet.txt --lm data/all_combinations/tmp/lm.binary --trie data/all_combinations/tmp/trie --audio ../../learning2/mobile_recorded/02_Jan_testing/1577956542158_denoised.wav

TensorFlow: v1.13.1-10-g3e0cc5374d

DeepSpeech: v0.5.1-0-g4b29b78

2020-01-08 15:06:05.733966: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2020-01-08 15:06:05.741175: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant

2020-01-08 15:06:05.741195: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant

2020-01-08 15:06:05.741204: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant

2020-01-08 15:06:05.741212: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant

my the three

cpu_time_overall=1.06978

Also, to be noted -t flag doesn’t print mfcc data as it should do. Where should I look for the same to enable mfcc output?

lissyx · January 8, 2020, 9:49am

May i ask why ?

I’m not sure what you are referring to, -t gives timing results, and I see cpu_time_overall, so it’s working as intended.

That makes no sense. What output do you want ? MFCC are a transformation of the audio signal used as input of the network.

dhanesh · January 8, 2020, 10:05am

In deepspeech executable document, -t command has following description:

$ ./deepspeech 
Usage: ./deepspeech --model MODEL --alphabet ALPHABET [--lm LM --trie TRIE] --audio AUDIO [-t] [-e]

Running DeepSpeech inference.

	--model MODEL		Path to the model (protocol buffer binary file)
	--alphabet ALPHABET	Path to the configuration file specifying the alphabet used by the network
	--lm LM			Path to the language model binary file
	--trie TRIE		Path to the language model trie file created with native_client/generate_trie
	--audio AUDIO		Path to the audio file to run (WAV format)
	-t			Run in benchmark mode, output mfcc & inference time
	--extended		Output string from extended metadata
	--json			Extended output, shows word timings as JSON
	--stream size		Run in stream mode, output intermediate results
	--help			Show help
	--version		Print version and exits
TensorFlow: v1.13.1-10-g3e0cc5374d
DeepSpeech: v0.5.1-0-g4b29b78

That’s why I got confused. Clarified now: client.cc#L283

I am doing multiple things here:

Generate custom lm with limited sentences (digits and few simple commands)
See how deepspeech does recognition in steps ( and hence mfcc and intermediate outputs)
Tinker with ctc decoder code and check how lm is changing my results.

lissyx · January 8, 2020, 10:28am

Ok, you got confused because it just means we output the time it took for performing mfcc + inference. Not we output MFCC.

dhanesh · January 9, 2020, 3:17am

Yes. I have made few changes in ctc decoder (as of now just dummy prints), but I don’t see the changes getting reflected even after pip uninstall ds_ctcdecoder and then pip install native_client/ctcdecode/dist/ds_ctcdecoder-0.5.1-cp37-cp37m-macosx_10_14_x86_64.whl. Is deepspeech using some cache version of ctc decoder? If yes, how to clear it?

lissyx · January 9, 2020, 9:05am

No, there is no such cache. Please triple check your build / install steps.