Cannot train on specific data

Sushantmkarande · October 24, 2019, 2:16pm

I am trying to finetune deepspeech model on youtube data.
I downloaaded audio and its subtitle from youtube. then i split that data by every sentenceOneSentence.zip (268.6 KB)
it is training normally,
but when I try to split data on two sentence i am getting following error,
IndexError: index 0 is out of bounds for axis 0 with size 0
this is second dataTwosentence.zip (268.2 KB)

there is no difference between these two data, except that in first data-set every sentence is different file and in second data-set two sentence is merged together in one file

lissyx · May 20, 2019, 12:50pm

Can you share the CSV file ?

Sushantmkarande · October 24, 2019, 2:16pm

@lissyx
here is my train csv for both 1 sentence and 2 sentence
train.csv.zip (1.9 KB)

lissyx · May 20, 2019, 1:09pm

Can you try without spaces in filenames ?

Sushantmkarande · May 20, 2019, 1:10pm

ok I will try that but data with 1 sntence is training fine…
thank you.

lissyx · May 20, 2019, 1:12pm

Well then sorry but your description of the issue is unclear to me.

Sushantmkarande · May 20, 2019, 1:15pm

@lissyx does our model have any limitation on how long training sample should be or how much words in one training sample should contain.

lissyx · May 20, 2019, 1:16pm

Could you just explain clearly what issue you are hitting ? There is no “model limitation” such as this, yet we try to avoid audio above 10 secs.

Sushantmkarande · May 20, 2019, 2:22pm

@lissyx
maybe i am doing something wrong i will try to solve this problem.

what do you think about above dataset. do you think i can increase accuracy of indian accent with this dataset,
I have around 7-8 hour dataset like this from dataset.

lissyx · May 20, 2019, 2:24pm

I already told you that I don’t understand exactly your error, because your description is too vague and scarce.

Sushantmkarande · May 20, 2019, 2:32pm

I am not talking about the error. I will try and fix that,

I have attached dataset sample, can you tell me if its good dataset to increase accuracy.

eggonlea · May 20, 2019, 4:34pm

I think his problem is that, the “two sentence” data set will trigger some error during training while the “one sentence” set works fine.

@Sushantmkarande I don’t think anybody can answer the question if this is a good dataset or not. The accuracy depends on a lot of parameters (instead of just one fine tuning data set). It’s always good to try with your dataset and see what happens.

Sushantmkarande · May 20, 2019, 6:23pm

thank you. that is exactly my problem.
now I think that problem is occurring because they are not in sync.
sometimes there are not some words in dataset that are really spoken in audio.
so I think model knows how many words are spoken from audio and when it tries to match audio signal to words it does not find that word so it throws an error.
let me know if that’s the case

lissyx · May 20, 2019, 9:38pm

if you really concatenate both audio and its transcription, it should work. As much as I recall, CTC should be able to deal with some difference. It does not know “how many words are spoken”

Sushantmkarande · May 21, 2019, 5:31am

that is great to hear.
thanks.
I will try to resolve this problem now.

Sushantmkarande · May 23, 2019, 6:26am

the problem is solved, there was no ’ character in my alphabet.txt