Hi,
Is it compulsory to have training and inferring audio file length equal to 5 seconds?
I have this questions because I have a large amount of training data with audio(every audio more than 30 seconds) and respective transcripts. If I can’t use this data as it is for training, then I need to chunk the audio files( which I can do easily with some python script) but I am finding it difficult to chunk the transcript for the respective chunked audio files. I am doing it manually for now, but is there any way to automate it?
Any suggestions?
Thank you:)