Cannot train on specific data

I am trying to finetune deepspeech model on youtube data.
I downloaaded audio and its subtitle from youtube. then i split that data by every sentenceOneSentence.zip (268.6 KB)
it is training normally,
but when I try to split data on two sentence i am getting following error,
IndexError: index 0 is out of bounds for axis 0 with size 0
this is second dataTwosentence.zip (268.2 KB)

there is no difference between these two data, except that in first data-set every sentence is different file and in second data-set two sentence is merged together in one file

Can you share the CSV file ?

@lissyx
here is my train csv for both 1 sentence and 2 sentence
train.csv.zip (1.9 KB)

Can you try without spaces in filenames ?

ok I will try that but data with 1 sntence is training fine…
thank you.

Well then sorry but your description of the issue is unclear to me.

@lissyx does our model have any limitation on how long training sample should be or how much words in one training sample should contain.

Could you just explain clearly what issue you are hitting ? There is no “model limitation” such as this, yet we try to avoid audio above 10 secs.

@lissyx
maybe i am doing something wrong i will try to solve this problem.

what do you think about above dataset. do you think i can increase accuracy of indian accent with this dataset,
I have around 7-8 hour dataset like this from dataset.

I already told you that I don’t understand exactly your error, because your description is too vague and scarce.

I am not talking about the error. I will try and fix that,

I have attached dataset sample, can you tell me if its good dataset to increase accuracy.

I think his problem is that, the “two sentence” data set will trigger some error during training while the “one sentence” set works fine.

@Sushantmkarande I don’t think anybody can answer the question if this is a good dataset or not. The accuracy depends on a lot of parameters (instead of just one fine tuning data set). It’s always good to try with your dataset and see what happens. :slight_smile:

thank you. that is exactly my problem.
now I think that problem is occurring because they are not in sync.
sometimes there are not some words in dataset that are really spoken in audio.
so I think model knows how many words are spoken from audio and when it tries to match audio signal to words it does not find that word so it throws an error.
let me know if that’s the case

if you really concatenate both audio and its transcription, it should work. As much as I recall, CTC should be able to deal with some difference. It does not know “how many words are spoken”

that is great to hear.
thanks.
I will try to resolve this problem now.

the problem is solved, there was no ’ character in my alphabet.txt

1 Like