Access activations of the neurons

Is it possible to access the activation of neurons when the model is used to recognize a sentence ?
I am not familiar with pb output graphs, is it possible ton convert them in ckpt to use them in python ?

Thanks a lot

We do release the checkpoints, so you should be able to use them :slight_smile:

Thanks for your reply. On the github page, I saw it was possible to download the pretrained model at
https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz

which contains the .pb graph. Is that what you are talking about?

Or is there also the model in a .ckpt format somewhere ? If yes, do you have a link ?

Thanks

The very same page, has a link, just above the one you copied, that is the checkpoints: https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-checkpoint.tar.gz

Oh I see, I was in the README page, but it is in the Releases section.
Many thanks ! :slight_smile:

If I understood well, the pb file contains the graph and the checkpoitn contains the weights of the model. But now, if I want to access the activations of the neurons, I need to put a wav file at the input. But I don’t understand where is the preprocessing of the wav file for conversion in mfcc before the input of the model.
the model.py seems to be an interface (swig) with something else I don’t know how to access.
Any idea ?

You seem to be mixing inference code (model.py ? there is no such thing) and training code. But we use SWIG to generate the Python bindings.

There is support for single-shot inference in DeepSpeech.py, but I’m not sure exactly what you want to do.

So, what do you want to do ? Which activations are you interested in ?

During training, it should all be done from util/feeding.py, check the call to audiofile_to_input_vector

In native_client/python/client.py, there is ds (deepspeech model I guess)
"ds = Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)"
which is called later “ds.stt(audio, fs)” which is in the deepspeech module in the script “model.py”

what I want is to visualize the activation of the neurons for a given audio file. For example, taking the mean of the activation of every neuron, store them in a matrix and visualize them with an imshow()

Can you describe exactly what you mean by “activation of the neurons” ? Do you want every neuron? That’s going to be a lot of weights to deal with.

Anyway, I think you should do that by playing with the single shot inference codepath in DeepSpeech.py instead, it will be easier to hack for you.

You can write some TensorFlow code to fetch the “logits” node instead of the decoded output. You could modify the do_single_file_inference function in DeepSpeech.py to fetch 'logits' instead of outputs['outputs']: https://github.com/mozilla/DeepSpeech/blob/b6c78264ee5101c7363a6e8f36b553132451b983/DeepSpeech.py#L1778-L1781

It’ll be a tensor of shape [timesteps, batch_size, num_classes], where timesteps is variable and depends on the length of the audio file, batch_size is 1 by default, and num_classes is the size of the used alphabet plus one (for the CTC blank label).

Thanks a lot, it seems interesting, I’ll check that out !