Hardware required for Deespeech Training

abhijeetchar · July 7, 2018, 4:32am

I am interested in building an english ASR system.
I tried pretrained models of DeepSpeech by mozilla on CPU. I converted all formats into mono .wav 16k sr. I used py-webrtcvad for chunkizing long audio in small chunks (as suggested in this forum). Building all this worked very well on Librispeech test-clean corpus, fairly for US accent converstational audios but fails for UK or indian accents.

Now I want to train it for UK english accent. At this point, I am stuck in Infrastructure required for training it on say 1000 hours in order to get good accuracy on Uk accents.

What infrastructure would I need in terms of GPU for building/training a ASR systems. Is the below server config good enough ?
CPU 16 cores
SSD 240 GB
GPU - 4 Titan X pascal OR 2 GTX 1080 TI (Which one shall I get)
RAM - 32 GB

Correct me if I am not making much sense here.

lissyx · July 7, 2018, 9:38am

We do training on a set of TITAN X, so I think that with 4 of them, you should be able to get something useful over a set of 1000 hours within a week or so. Do you have access to 1k hour of UK accent ?

abhijeetchar · July 7, 2018, 10:38am

Great. I think I will go with this kind of set up then. I do not have access as of now, but I will be getting at least 400-500 hours of data.
Right now I am starting in ASR (I had done ivector based speaker id, language id etc).
by the way, how many hours of data would be required to convert it from something useful to let’s say >90% accurate for a particular business domain. ?