Decoding time of 30 second audio file

cbsandeep10 · June 5, 2018, 7:26am

Hi,

I’m running deep-speech training on 1000 hours of audio files.
How much time will an epoch take if I’m using 4 NVIDIA TESLA P100 (16 GB each) and also how much time it will take to decode a 30-second audio file on both GPU and CPU system?

Thanks

kdavis · June 6, 2018, 10:15am

It’s basically impossible for us to gauge the time required without knowledge of the distribution of snippet lengths in your data set. For example, a really long sample could force a batch size of 1 and make training take very long.

However, by way of comparison, when we train on the 1k hours of LibriSpeech using 8 Titan X Pascal GPU’s it takes several days to converge.

As to decoding time on a CPU and/or GPU, it depends on the CPU and/or GPU. The surest way is to try. By way of comparison we’ve gotten faster than real time on a 1070 for clips of approximately 5sec in length.

As suggested in the README, the architecture is currently geared towards dealing with shorter clips of about 5 sec. So for a 30sec clip YMMV.

However, a streaming interface is current in the works[1] and should lift this limitation.