Hi,
I’m currently using the SpeechMatics.com API to transcribe audio files into text, in the following json format
[
{name: "word1, time: 130, …}
{name: "word2, time: 132, …},
…
]
but considering the cost per minute, I want to use my own engine, I tested deepspeech and I think with learning, I will arrive at a good result, the only problem is that the text is in raw, and it is impossible for me to know when words was pronounced
any idea to reproduce speechmatics api result ?
thanx in advance, and sorry for my bad english