Oh cool, I missed that. I’ll look into it, thanks!
I guess If I can reduce the sensibility by tuning VAD parameters, removing noise from the source-audio and using a LM with a even less diverse vocabulary (like “left right up down” for a snake game) I should be able to get pretty accurate results. At least a more noise-resistance performance.
I’ll update this thread when I’ve achieved that.
Thank you for being so active around here!
1 Like