I trained my model on words zero to nine audio datasets collected from 116 people and then trained the model and it is doing way good job even in noise and i havent received any wrong inference.
The only problem is when people say only word from zero to nine it works well but when somebody says their phone number like “nine five four two three two eight nine seven zero” it skips most of the word it doesnt give wrong inference but skips most of the words like it will infere as “nine two eight seven”.
I have included zero to nine numbers in language model as well like this
zero
one
two
three
four
five
six
seven
eight
nine
what should i do for better sequence of words prediction?