Speech cutter

carlfm01 · January 24, 2019, 10:16am

Hi, I’ve wrote a tool to align speech, the idea is to use the current text cleaning tools to create a senteces for a chapter downloaded from librivox, then with the sentences we spot them in the audio of the chapter using the Windows speech recognition, if there’s a match with a specific confidence value the offset of the audio and the duration of the speech are passed to ffmpeg to split the audio.

Would be amazing to get any suggestions to improve it.

Hope it helps.

nukeador · January 24, 2019, 12:17pm

So basically you take an audiobook audio and the book text and use Windows speech recognition define where a sentence starts and end?

This is very interesting, specially because Librivox material is all public domain.

@josh_meyer maybe this is something we can automate to create a dataset with text-audio pairs that can be used by Deep Speech?

The only issue I see is that most books are just read by one person, and we really need a lot of diverse voices to train the algorithm.

carlfm01 · January 24, 2019, 7:00pm

Yes

yes, for example Spanish

nukeador · January 25, 2019, 12:46pm

@josh_meyer what do you think about this idea?