Is dataset of acoustic model subset of dataset of language model?

kouohhashi · August 1, 2019, 12:57pm

Hi,
I have a question around dataset to train a model from scratch.

Does dataset of acoustic model have to be subset of dataset of language model?

I think there are a few scenarios.

I mean acoustic model can have wav audio and sentence like “I love dogs.”
Language model can have “I love dogs. I love cats too”.

Which is recommended?

Thanks in advance.

kdavis · August 1, 2019, 2:04pm

No.

The 0.5.1 model embodied your case 1, “Dataset of acoustic model is different from dataset of language model.”

Which is recommended depends on your use case.