Creating DeepSpeech Model for Hindi

lissyx · October 15, 2019, 12:28pm

I would suspect your importer code.

cryptoaimdy · October 15, 2019, 12:30pm

i did not understand

lissyx · October 15, 2019, 12:32pm

There is likely a bug somewhere that makes your data getting funny. Have you written your any code for those data ?

cryptoaimdy · October 15, 2019, 12:35pm

No. i am just using it in deepspeech’s version 0.5.1 code

lissyx · October 15, 2019, 12:42pm

Ok. First, it’d be better you work on master. Apply https://gist.github.com/reuben/b68b9085f7b293580f8431156a33daa9 if you need to reload a 0.5.1 english checkpoint.

cryptoaimdy · October 16, 2019, 9:56am

no luck with this. i tried.

i think fault was in my binary which i created with wrong alphabets. Now i am trying again.

after cloning deep speech doing
git checkout v0.5.1
but the version is 0.6.0 alpha 9

cryptoaimdy · October 16, 2019, 12:56pm

Test on data/test/test.csv - WER: 1.000000, CER: 0.911950, loss: 113.759827
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.666667, loss: 35.086880
 - src: "नाम"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.833333, loss: 39.348869
 - src: "सहायता"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.909091, loss: 67.961250
 - src: "सहायता2करिए"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.875000, loss: 79.533066
 - src: "आपका2नाम2क्या2है"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.866667, loss: 83.190369
 - src: "तमहरआ2कयआ2नाम2ह"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.900000, loss: 104.061623
 - src: "क्या2बोलना2चाहते2हैं"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.944444, loss: 104.943001
 - src: "हिंदी2में2बात2करिए"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.931034, loss: 133.641830
 - src: "मै2आपकी2कया2सहायता2कर2सकता2हू"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.937500, loss: 159.305466
 - src: "सर2मै2विमवीजयोर2से2बात2कर2रहा2हू"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 330.525848
 - src: "नई2दिल्ली"
 - res: "का"
--------------------------------------------------------------------------------

SRC is correct now, but why in between words instead space numerical ‘2’ is coming? any idea?

lissyx · October 16, 2019, 1:18pm

It does not looks like ASCII 2, more like some other UTF-8 variant. Maybe something with your alphabet? It’s really important to ensure you use the same alphabet everywhere.

cryptoaimdy · October 16, 2019, 1:23pm

i am using same alphabets every where, but at the time of training it says your have some missing alphabets in alphabet.txt that are present in train test or dev fiiles, but my alphabets are already present in that alphabet.txt. when i am deleting that alphabet and again entering the same, it goes and works properly. but dont know what is problem with number 2 intead of spaces. i created alphabets in utf-8 using notepad.

lissyx · October 16, 2019, 1:44pm

It’s possible windows line endings are playing a role here

If it says it cannot find the character, you need to fix that in your alphabet file if it’s a legit character, or cleanup your dataset if it is not

I’m not sure I get your process here.

cryptoaimdy · October 17, 2019, 6:21am

It keeps saying the word
(' ')
is not present in your alphabet.
do i have to add spaces after each character in alphabet.txt?

lissyx · October 17, 2019, 6:48am

No, but you need it at least once in your dataset. Make sure this is the proper UTF-8 code.

cryptoaimdy · October 17, 2019, 12:49pm

All done,

Hi, What max length of audio would be best for training data?

or

what should be length of audio/words for training to get best result in model.

can we place like 5-10 minutes conversation of each audio for training?

lissyx · October 17, 2019, 12:52pm

This is mostly going to be limited by your batch size and your GPU memory. To give you a ballpark, 11GB RAM on a GPU, I cannot go above 68 batch size with clips up to 10-15 seconds. If I push more, then I run out of GPU memory.

cryptoaimdy · October 17, 2019, 1:14pm

Okay Thank You for the support. great community with great people