Train for Portuguese that can even detect english proper nouns

Hi,

I have trained DeepSpeech to recognize Brazilian Portuguese. I have trained it with Brazilian proper nouns, music artists/songs/albums also and changed the language model accordingly. It’s working fine for now.

Now I want it to recognize the English proper nouns (and English music artists/songs/albums) .

  • One approach I tried is I’ve included those English words in Language model (with many variations in sentences). But still the detection is not good.
  • Second approach is that I’ve included English speech dataset in both train and dev, along with Portuguese datasets. This has slightly improved the English proper nouns detection, though at a huge cost of Portuguese detection.

Would like to know, if there are any more suggestions that I can try out, to achieve good enough Portuguese detection with ability to recognize the English proper nouns (with one acoustic model and one lang model).

Hi @lissyx @kdavis @reuben … any suggestions !

I would take a closer look at the distribution of English vs Portuguese in the training set to better match the distribution you’re aiming for (only a few English words). Also, if every sample in your training set is either completely in English or completely in Portuguese, that could be a problem, as you want the model to be able to handle code switching. So I would try to augment the training set by joining English and Portuguese samples to create some sort of artificial code switching.

@reuben Thanks… when you mean “artificial code switching”, does it mean some decision that code (not the ASR model) takes based on input audio?..

Could you elaborate more on this

So, suppose I’ve a complete audio dataset of utterances with code-switch (like “tocar musicas de alan walker”) using a huge list of famous artists/albums/songs in Brazil (that includes both Portuguese and English names).

Do training with that dataset will be enough for DeepSpeech to handle code-switch?

Also, any suggestions on how to pick the final dataset by looking at my Portuguese, English and mixed dataset (that also includes code-switching)

I can’t see the future.

I’ve already said how I would do this in my first message.

@rpratesh one simple solution (as @reuben has suggested) is to change the weight of portuguese audio samples with respect to english ones. I guess that english audio samples outnumber the portuguese ones. One simple way to increase the weight of portugues is to concatenate multiple copies of the portuguese training sets so that the number of samples is greater than the english one. I am planning to do this test myself. Let me know if you get any good result from this approach.

Yes @Paolo_Losi… that is good solution in general… but in my case, I’ve very large dataset for Portuguese compared to small and specific English dataset(artists/albums/songs).

@reuben When you meant by distribution, was it the histogram of how much data is there on either sides of English and Portuguese so as to balance them…
or is it the distribution of features (say avg. MFCCs) of each wav files to see if they form any cluster?

I meant how much data of each language, yes.

@rpratesh for dataset I mean audio dataset for the audio model, not text dataset for the language model…

Yeh… I also was referring to my audio dataset