I’ve seen some articles about training an acoustic model yourself. Or using checkpoints to continue training a pre-trained model.
To my understanding the pre-trained model from 0.5.1 is trained based on LibriSpeech which contains clear noise-free american english voice. For productive usage with noise and foreign accents, the 0.5.1 pre-trained model by far isn’t accurate enough. Even when using a custom LM that just contains 20 vocabularies it fails often with judging noise as speech-input or map the speech to a similar sounding word (e.g. beer --> tea).
Are there any pre-trained models based on the CommonVoice dataset? Or generally any english model thats more noise/accent resistant?
Training it myself would probably take weeks on poor hardware and I assume I’m not the first one in need of a more resistant english acoustic model, so I reckon I’d open this for discussion.
Thanks!