I’m interested in using deep speech to do transcriptions on voicemails I receive. So far the biggest limitation in the recognition is phone numbers – deep speech seems biased against the audio containing a string of digits in a row, which makes sense since saying “five five five three four four …” is not common in normal speech, but very common in voicemails.
Is it possible to create a training model that starts with the pre-trained model, but adds more known-good transcripts I have on top of that? Maybe if I just feed it a lot of people saying phone numbers this will improve.