Train French Model

Hello guys!!
I would like to train my own model in french , but the data of common voice is not enough to give a powerful model,

  • I want to know where I can collect more data , if you have links do not hesitate !!
  • what is the minimum number of hours to have a good model with good results?

You are welcome to contribute on https://discourse.mozilla.org/c/voice/fr and https://github.com/Common-Voice/commonvoice-fr, there’s already a list of dataset amongst Common Voice that you can use.

Regarding the minimum number of hours, that depends on your definition of good model and good results. Besides, have a look at the issues on the Github repo linked above, there’s already a list of actionnable items to help fix and augment the quality of current datasets, including Common Voice in French.

Hello! If you need to know whats being said, that is, get only “keywords”, you can get decent results (WER %30 … if LM is even average … this is something someone else could comment ) using just 100-200 hrs of domain specific training data… but if you would like to get all “stopwords” and train general model to handle all kind of subjects then you need hundreds and hundreds of hours of training data. (Baidu used several thousands hours to train their model …)

@pete clear answer thank you

@lissyx thank you for your answers