Word prediction based on vocabulary

Greetings,

Does deep speech makes word prediction based on words in sentence, and if the words in sentence are not meaningful do they have any negative effect on the trained model?

Also when the vocabulary is created, should it be created with words or sentences?

Thanks in advance,

That depends on what you intend to use after.

I’m unsure I get your question here.

So let’s say there is sentence where the words in the sentence are not linked in a context, i.e.

  1. ( from school. the news is that)
  2. (morning, but the bad)

Can theses sentences and those similar to them be used to train the model?

This is confusing, are we talking about training the acoustic model, or building the language model here ?

Also when the vocabulary is created, should it be created with words or sentences?

That depends on what you intend to use after.

To train for a specific language, should the vocabulary be populated with sentences or words?

Thanks,

Again, that depends on what you intend to do after. Just spotting some commands ? Generic speech ?

I mean for generic speech.

Then you should build the training dataset and the language model as close as possible to real speech.

In order to get the content I am splitting long audios into chunks and, for these chunks I am writing the corresponding sentences, but the problem is that sometimes the splited audio contains non-meaningful sentences. Is it ok to use these for both acoustic model and the language model?

thanks

Like those two examples above ? They don’t feel such non-meaningful to me :).

I’d say if you really worry, don’t put them into the language model, but you should still be able to rely on their knowledge for the acoustic part.