Unable to predict domain-specific keywords perfectly

nishthajain1611 · November 20, 2019, 12:21pm

Hi,
I have trained the DeepSpeech model on the Hinglish dataset of 2000 hours. Our model predicts general words very correctly but is unable to predict domain-specific keywords perfectly.

So, I want to predict domain-specific keywords correctly, I have some doubts:

How can I add these boosted words (like in Insurance Domain: “Policy”, “Admission”, “Insurance”, “name of patient”, “diagnosis” etc ) to my dataset? Do I need to add it during training at the acoustic level, or to the language model (like in vocab)?
Is there any way of NER based language model along with the current language model?

Thanks

lissyx · November 20, 2019, 12:25pm

NER based language model ? Can you be more specific ?

You can augment / tune the language model, that’s likely to be the most effective.

nishthajain1611 · November 21, 2019, 10:32am

I read a paper to incorporate NER with language model to improve the transcription of Entities, so I thought we can make a separate NER based language model. But I have not found this with Deep Speech.
https://hal.archives-ouvertes.fr/hal-00843211/document

Would you like to suggest the augment/tuning of language model that is possible, I am not able to figure out how can we tune lm.

Thanks

lissyx · November 21, 2019, 4:25pm

Thanks. We have not worked on that topic, so I don’t really have a feedback on that. I’ll have to read the paper. Luckily, Le Mans and Nantes are pretty close to me

This is documented under data/lm/

rahul · November 27, 2019, 6:41am

Hi,
Is the dataset public? If yes, could you please post a link? If no, is there a way to obtain it?

Very curious about the accuracy. Could you please share the WER?

Thanks,
Rahul