i was fine tuned pre-trained model with youtube datasets (indian accent). it is near 100 hrs audio files. epoch 35, batch size 3-3-3 meanwhile remaining things are same for our deepspeech instruction for continue training.
I Testing epoch 35...
I Test of Epoch 35 - WER: 0.500695, loss: 88.6900565696485, mean edit distance: 0.278072
I --------------------------------------------------------------------------------
I WER: 0.125000, loss: 0.063078, mean edit distance: 0.025000
I - src: " difference don't freak out if you get a"
I - res: " difference don't freak out if you get "
I --------------------------------------------------------------------------------
I WER: 0.142857, loss: 0.060561, mean edit distance: 0.024390
I - src: " slice of the retail business that's over"
I - res: "a slice of the retail business that's over"
I --------------------------------------------------------------------------------
I WER: 0.142857, loss: 0.089889, mean edit distance: 0.027778
I - src: " question what is it about the first"
I - res: "a question what is it about the first"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.007418, mean edit distance: 0.142857
I - src: " change"
I - res: "i change"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.007418, mean edit distance: 0.142857
I - src: " change"
I - res: "i change"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.013327, mean edit distance: 0.250000
I - src: " company"
I - res: "a company "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.013327, mean edit distance: 0.250000
I - src: " company"
I - res: "a company "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.056036, mean edit distance: 0.125000
I - src: " project"
I - res: "a project"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.075017, mean edit distance: 0.250000
I - src: " project"
I - res: "a project "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.075017, mean edit distance: 0.250000
I - src: " project"
I - res: "a project "
I --------------------------------------------------------------------------------
I Exporting the model...
Converted 12 variables to const ops.
I Models exported at model_export_youtubeV3/
and i am testing inference,
actual:how are you
res: aactual: how can i apply for aadhaar card pan card
res: how can a a bar abdyactual: you are not working
res: a
deepspeech 0.2.1a1
tensorflow 1.11.0
DeepSpeech v0.2.0 and pretrained model v0.2.0
i was trained model for EC2 p3 instance xlarge 8GPUs. it takes 18 hrs.
sir can you help me please? is it any problem for hyper parameter? then how to fine tune a training and get good accuracy?
i didn’t get best accuracy for my model(indian accent) for fine tuning.
thank you,
Murugan R