We would like to use deepspeech for a classification problem – given an audio command we want to tell which command it most closely matches. To this end, we want to use only the pretrained encoder here in deepspeech and check vector encoded outputs of various audio files to determine which are the closest.
- Is this the right way to use this library to accomplish our issue
- Is there any clean way in code to run an audio clip only through the encoder? We have been digging through the source and it appears to be nontrivial. If there isn’t, I would like to request this as a feature. It appears that other people may have this issue in the future as well.