Intermediate layer(BiRNN) output for an audio file

How to take the intermediate layer embedding of an audio file from a trained deepspeech model.

The easiest way to get started is by just adding the layer of interest e.g. layers['layer_5'] to the first argument of session.run, and add a corresponding variable to the left of the = before session.run

1 Like

Is there a code to take inference of an audio file from deepspeech model without using deepspeech binary

Sure, have a look at FLAGS.one_shot_infer in the DeepSpeech.py file.