Hello!
I am forced to use 8000hz / mono audio (Phone calls). I know DeepSpeech works best 16000hz, so my questions goes:
Does Deepspeech (version DeepSpeech: v0.4.1-0-g0e40db6) upsample my training material from 8000hz to 16000hz ? What about dev and test material, does it also upsample those from 8000hz to 16000hz ?
Anyone studied bad labeled audio data, how it affects to results ? Of course it will affect, but lets say my data is 51% labeled right, and rest of it is gibberish or wrong words etc. do you think it might still do the job if I have it enough and still over 50% or more are Ok … ?
I am doing some semi-automatic labeling to audio, and this method is producing those figures … doing same job manually would be expensive and -very- time consuming as you know…