DeepSpeech cloud architecture

oscar.benitez1962 · April 16, 2018, 12:48pm

Hi
What cloud architecture do you recommend to perform text inference from ~50K audio files a month? The audio files come from a call center system and be anywhere from 80 kb to 30 mb in size
Thanks in advance

lissyx · April 16, 2018, 12:55pm

80k-30M is a wide range. How much does that makes in time ? For longer times you might have poor results, it might help if you can breakdown the longer audio into smaller chunks, around silences. Any mid-range gaming GPU such as GTX1070 should already be enough to be twice time faster than realtime, so from there you can easily perform a back-of-the-envelope computation to size your infrastructure. It also depends on how fast you expect things to run.

oscar.benitez1962 · April 16, 2018, 1:09pm

@lissyx
Thanks for your response. I was wrong in the formulation of my question, because I have already planned to divide the audio files based on the silences as you suggest. I will check the GTX1070 to figure it out the architecture. Thanks!