Problem in Convergence - DeepSpeech not learning

abhijeetchar · November 30, 2018, 12:28pm

Hi everyone,
I did the training of DeepSpeech Model on 300 hours of data (tedlium + voxforge) as proof of concept to establish that we can do a large scale training and get something useful (say 85% accurate)

Hardware : GTX 1080Ti
Hyperparameters:

I reduced n_hidden = 1024 as I had 300 hours of data only.
Dropout : 0.30
Train Batch Size : 64

Language model was the pre-trained released by DeepSpeech.
Rest all the parameters were same or default

Here are how model was learning.

It learnt for initial 10 epochs as train and validation error was doing down.
It did not learnt anything in next 10 epochs. It was not even over-fitting.
It seems like it’s a case of underfitting where both train and validation loss very high and not decreasing.

Overall test accuracy was 20% which is very poor (WER 80%).
It took straight 9-10 hours to train (300 hours) this model for 20 iterations on a 1080Ti GPU system.

Can anyone give me some insight of what I should have done to get some good accuracy ? I saw people getting WER ~39% alone on Ted corpus.

Just wanted to know whether there are experimented parameters which has yielded best accuracy on ted+voxforge corpus ?

muruganrajenthirean · December 10, 2018, 7:08am

sir i think you should keep dropout value <= 0.12 and reduce batch size, model will learn better.

abhijeetchar · December 10, 2018, 7:26am

Cool will try. This was 10 days back.

To experiment, I started with 100 hours.
By last night, after doing all the parameters search,
I achieved 76% accuracy (WER 24%) on 100 hours of voxforge dataset only.

This is by far the best I could achieve only on voxforge. Now, I am trying changing Uni-RNN with Bi-RNN in current deepspeech setup.

muruganrajenthirean · December 10, 2018, 7:31am

sir if you don’t mind, i ask one question.
do you develop indian accent based? or US/UK based english accent?
i need some suggestion, i don’t have much datasets to build indian accent model.
any idea about this, how can i prepare datasets for indian accent?

abhijeetchar · December 10, 2018, 7:40am

Right now, I am only experimenting. But yes I may have to develop it for UK English accent 1-2 months down the line.

You have to chunkize the Indian accented audios in 5-10 seconds clips and collect the transcriptions.
You can use Google docs translator to translate and then manually correct it.

OR Try following this

muruganrajenthirean · December 10, 2018, 7:44am

@abhijeetchar sir thank you so much. i will try to prepare data for this way.

muruganrajenthirean · December 10, 2018, 7:55am

You can use Google docs translator to translate and then manually correct it.

sir but google they not gave en-IN accent for translation.

abhijeetchar · December 10, 2018, 8:36am

That is why I said then manually correct it further.

Also, That’s Google buddy, their model may have been trained on 10,000 hours of speech from all accents etc,
As far as I know, It does a fair job even on Indian accent. I had tried it sometime back

muruganrajenthirean · December 10, 2018, 8:41am

thank you so much sir. i will do it.

lissyx · December 11, 2018, 10:14am

I would avoid that kind of suggestion, it’s very likely that the Google Translate terms of use prohibits that

abhijeetchar · December 11, 2018, 11:07am

Oh is it ?
Though I had tried playing few audio to Google docs voice typing. It was able to transcribe.

Also, there are constraints from business which prevents such transcribing methods.

lissyx · December 11, 2018, 11:16am

I’m not saying it’s not working, I’m saying it’s likely non authorized to use google translate to teach a concurrent system

abhijeetchar · December 11, 2018, 11:43am

Cool man…I got your point.

yv001 · December 13, 2018, 11:23am

yes, a link to the terms of use prohibiting this is here: google terms of use