Fine tuning a model with large datasets. model can't adapt for transfer learning?

muruganrajenthirean · November 28, 2018, 5:53am

i was fine tuned pre-trained model with youtube datasets (indian accent). it is near 100 hrs audio files. epoch 35, batch size 3-3-3 meanwhile remaining things are same for our deepspeech instruction for continue training.

I Testing epoch 35...
I Test of Epoch 35 - WER: 0.500695, loss: 88.6900565696485, mean edit distance: 0.278072
I --------------------------------------------------------------------------------
I WER: 0.125000, loss: 0.063078, mean edit distance: 0.025000
I  - src: " difference don't freak out if you get a"
I  - res: " difference don't freak out if you get "
I --------------------------------------------------------------------------------
I WER: 0.142857, loss: 0.060561, mean edit distance: 0.024390
I  - src: " slice of the retail business that's over"
I  - res: "a slice of the retail business that's over"
I --------------------------------------------------------------------------------
I WER: 0.142857, loss: 0.089889, mean edit distance: 0.027778
I  - src: " question what is it about the first"
I  - res: "a question what is it about the first"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.007418, mean edit distance: 0.142857
I  - src: " change"
I  - res: "i change"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.007418, mean edit distance: 0.142857
I  - src: " change"
I  - res: "i change"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.013327, mean edit distance: 0.250000
I  - src: " company"
I  - res: "a company "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.013327, mean edit distance: 0.250000
I  - src: " company"
I  - res: "a company "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.056036, mean edit distance: 0.125000
I  - src: " project"
I  - res: "a project"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.075017, mean edit distance: 0.250000
I  - src: " project"
I  - res: "a project "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.075017, mean edit distance: 0.250000
I  - src: " project"
I  - res: "a project "
I --------------------------------------------------------------------------------
I Exporting the model...
Converted 12 variables to const ops.
I Models exported at model_export_youtubeV3/

and i am testing inference,

actual:how are you
res: a

actual: how can i apply for aadhaar card pan card
res: how can a a bar abdy

actual: you are not working
res: a

deepspeech 0.2.1a1
tensorflow 1.11.0

DeepSpeech v0.2.0 and pretrained model v0.2.0

i was trained model for EC2 p3 instance xlarge 8GPUs. it takes 18 hrs.

sir can you help me please? is it any problem for hyper parameter? then how to fine tune a training and get good accuracy?

i didn’t get best accuracy for my model(indian accent) for fine tuning.

thank you,
Murugan R

lissyx · November 23, 2018, 12:48pm

It’s complicated, we are only experimenting yet on transfer learning, we dont have a lot of feedback yet on the proper steps to get something really good.

lissyx · November 23, 2018, 12:50pm

It’s complicated to judge with so few elements ; maybe you don’t have enough data yet. Can you ensure the source material is adequate ? PCM 16 bits 16kHz mono ? If you perform any conversion, can you ensure it does not add any artifacts ?

The v0.3.0 model might be a good try, it should contain more common voice data. You could also rebase your work on current master of deepspeech, with v0.3.0 checkpoints: you can benefit from the new decoder, it might improve things.

muruganrajenthirean · November 26, 2018, 4:50am

If I have to fine tune our pretrained model with libspeech not included for these three datasets( common voice, switch, fisher), will it get more accuracy and then adapt accent variations? are you previously tested sir?

thank you for your quick response sir.

lissyx · November 26, 2018, 6:48am

For the n-th time: it might, but we can’t promise anything.

muruganrajenthirean · November 26, 2018, 6:52am

thank you for your response sir

fmorakzai · December 30, 2018, 12:36pm

@muruganrajenthirean did you get any success in fine-tuning the model for Indian English? Do you already have a WER?

muruganrajenthirean · December 31, 2018, 6:33am

@fmorakzai yes sir.i am getting good results for indian english. and additional i did some techniques(augmentation for my own datsets, librivox dataset, voxforge, common voice etc…) then i got good results. still now also training process going on.

fmorakzai · December 31, 2018, 7:52am

Thanks Murugan. What WER and CER values were you able to achieve?

muruganrajenthirean · December 31, 2018, 8:59am

@fmorakzai sir,
I Test of Epoch 12 - WER: 0.255955, loss: 35.609541743282705, mean edit distance: 0.137530

still now training process going on.

fmorakzai · December 31, 2018, 12:10pm

@muruganrajenthirean That’s great work you are doing. I have tried your youtube scripts and have discovered an error. I have created an issue in your github repo “youtube-audio-and-transcript-extract”.

What is the best way to reach you to discuss the project?

Thanks

muruganrajenthirean · December 31, 2018, 4:06pm

Sir You will change that code to your way. These all preprocessing steps. But that datasets gave some how low accuracy.

fmorakzai · December 31, 2018, 7:23pm

ok thanks. When do you think the training will be over? It would be interesting to know the final WER.

I think your work is a valuable contribution. Do you intend to share your final model with the community?

sayantangangs.91 · January 7, 2019, 3:32pm

Hey, what’s the status? How’s the WER and coming out now?

muruganrajenthirean · February 28, 2019, 6:25am

sir actually i was failed with this dataset. i am getting worse results(WER–> 40%). in that preprocessing some audio features missed staring and ending. that is the issue.

kondaraunak · March 24, 2019, 9:48pm

Sir, what dataset did you use to train for Indian English accent?