Truncated vs. whole words in training files

FoleyFoley · September 28, 2018, 1:25pm

Hi all, I’m working on building a training data set of roughly 10,000 4-5 second audio clips with transcriptions for transfer learning. Many of my audio clips are cut across word boundaries, but I have text transcripts for the full word.

Example:
Audio: “sunny today with a chance of showers late in the aftern”
Text: “sunny today with a chance of showers late in the afternoon”

I may just flag these clips and see how they affect training, but since I’m putting some time and effort into preparing the training data I wonder if others on here have had experience with truncated training data. Which of these approaches has worked best for you?

training on the truncated audio with the full word text
training on the truncated audio with partial word text to match the truncated audio as best as possible
training on the truncated audio with no text for truncated words
discarding all clips with truncated audio and training only on clips with clean word boundaries

I’m ignoring a possible alternative, which is just re-cutting the audio files for the training data, because even with the best approaches I’ve tried to cut the files without truncating words this is still a common issue.

Thank you!

lissyx · September 28, 2018, 1:22pm

I think something is missing here

FoleyFoley · September 28, 2018, 2:42pm

Thanks, I’ve edited it!

lissyx · September 28, 2018, 2:49pm

I don’t think 1 nor 3 can give sound results, because then the audio will be matched to text that indeed does not represent the audio. Can you describe 2 ? I’d go for 4 for now.

Do you have figures regarding the amount of clips impacted? You say “many”, but it’s not really discriminant.

FoleyFoley · September 28, 2018, 6:56pm

I find that about 5 percent of my samples have noticeably truncated words, i.e. at least one full syllable of a boundary word is audible but the word is truncated. I may try training both ways- if I do that I’ll report back on how that went.

lissyx · September 28, 2018, 7:06pm

That’d be interesting but, IMHO, if you have to drop 5% because it’s degrading your quality, it’s a good deal.