Training sample size

We are evaluating Deep Speech to transcribe audio files in Brazilian Portuguese. They are calls made by users across the country to different call centers. To size the effort required in the training stage, we need to calculate the size of the samples. How do I properly calculate the sample size?

The kind of issue you may run into with the current model is if you train with small (e.g., 30 secs) samples and then you try to use on much bigger (30 mins). So I would say the sample size depends also on your needs. It might be easier to train on small samples (30-60 secs) and then rely on VAD to cut audio at inference ?

ola… tudo bem ? como está o status do projeto ja para traduzir speech to text em portugues ? arquivos de audio…

ja esta tudo ok ? voce ja fez funcionar ?