Access activations of the neurons

noetits · March 12, 2018, 9:43am

Is it possible to access the activation of neurons when the model is used to recognize a sentence ?
I am not familiar with pb output graphs, is it possible ton convert them in ckpt to use them in python ?

Thanks a lot

lissyx · March 12, 2018, 9:59am

We do release the checkpoints, so you should be able to use them

noetits · March 12, 2018, 10:35am

Thanks for your reply. On the github page, I saw it was possible to download the pretrained model at
https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz

which contains the .pb graph. Is that what you are talking about?

Or is there also the model in a .ckpt format somewhere ? If yes, do you have a link ?

Thanks

lissyx · March 12, 2018, 10:37am

The very same page, has a link, just above the one you copied, that is the checkpoints: https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-checkpoint.tar.gz

noetits · March 12, 2018, 10:54am

Oh I see, I was in the README page, but it is in the Releases section.
Many thanks !

noetits · March 15, 2018, 1:49pm

If I understood well, the pb file contains the graph and the checkpoitn contains the weights of the model. But now, if I want to access the activations of the neurons, I need to put a wav file at the input. But I don’t understand where is the preprocessing of the wav file for conversion in mfcc before the input of the model.
the model.py seems to be an interface (swig) with something else I don’t know how to access.
Any idea ?

lissyx · March 15, 2018, 1:53pm

You seem to be mixing inference code (model.py ? there is no such thing) and training code. But we use SWIG to generate the Python bindings.

There is support for single-shot inference in DeepSpeech.py, but I’m not sure exactly what you want to do.

So, what do you want to do ? Which activations are you interested in ?

lissyx · March 15, 2018, 2:00pm

During training, it should all be done from util/feeding.py, check the call to audiofile_to_input_vector

noetits · March 15, 2018, 5:06pm

In native_client/python/client.py, there is ds (deepspeech model I guess)
"ds = Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)"
which is called later “ds.stt(audio, fs)” which is in the deepspeech module in the script “model.py”

what I want is to visualize the activation of the neurons for a given audio file. For example, taking the mean of the activation of every neuron, store them in a matrix and visualize them with an imshow()

lissyx · March 15, 2018, 7:53pm

Can you describe exactly what you mean by “activation of the neurons” ? Do you want every neuron? That’s going to be a lot of weights to deal with.

Anyway, I think you should do that by playing with the single shot inference codepath in DeepSpeech.py instead, it will be easier to hack for you.

reuben · March 16, 2018, 10:53am

You can write some TensorFlow code to fetch the “logits” node instead of the decoded output. You could modify the do_single_file_inference function in DeepSpeech.py to fetch 'logits' instead of outputs['outputs']: https://github.com/mozilla/DeepSpeech/blob/b6c78264ee5101c7363a6e8f36b553132451b983/DeepSpeech.py#L1778-L1781

It’ll be a tensor of shape [timesteps, batch_size, num_classes], where timesteps is variable and depends on the length of the audio file, batch_size is 1 by default, and num_classes is the size of the used alphabet plus one (for the CTC blank label).

noetits · March 16, 2018, 3:59pm

Thanks a lot, it seems interesting, I’ll check that out !