Hi, I’ve wrote a tool to align speech, the idea is to use the current text cleaning tools to create a senteces for a chapter downloaded from librivox, then with the sentences we spot them in the audio of the chapter using the Windows speech recognition, if there’s a match with a specific confidence value the offset of the audio and the duration of the speech are passed to ffmpeg to split the audio.
Would be amazing to get any suggestions to improve it.
Hope it helps.