Modèle français 0.2 pour DeepSpeech v0.6

Hello,

J’ai repris le Docker précédemment partagé, et remis à jour sur la version actuelle de DeepSpeech (v0.6).

Il a été entraîné avec le Docker disponible https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/CONTRIBUTING.md

  • entraîné à partir de 0
  • importation de LinguaLibre
  • importation de TrainingSpeech
  • importation de Common Voice

Utilisation du language model

Côté qualité, voici la sortie des tests:

Testing model on /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_test.csv                                                                                                                                                                                                                                                                                                    
Test epoch | Steps: 106 | Elapsed Time: 0:01:06                                                                                                                                                                                                                                                                                                                                          
Test on /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_test.csv - WER: 0.544239, CER: 0.186163, loss: 8.556543                                                                                                                                                                                                                                                              
--------------------------------------------------------------------------------                                                                                                                                                                                                                                                                                                         
WER: 4.000000, CER: 0.333333, loss: 3.468420                                                                                                                                                                                                                                                                                                                                             
 - wav: file:///mnt/extracted/data/lingualibre/lingua_libre/Q21-fra-French/WikiLucas00/dutronien.wav
 - src: "dutronien"
 - res: "du tro ni en"
--------------------------------------------------------------------------------
WER: 4.000000, CER: 1.000000, loss: 12.942985
 - wav: file:///mnt/extracted/data/lingualibre/lingua_libre/Q21-fra-French/X-Javier/Cinépal.wav
 - src: "cinépal"
 - res: "si ne pas le"
--------------------------------------------------------------------------------
WER: 4.000000, CER: 0.538462, loss: 17.523869
 - wav: file:///mnt/extracted/data/lingualibre/lingua_libre/Q21-fra-French/Lyokoï/kakerlaquisme.wav
 - src: "kakerlaquisme"
 - res: "cacher la qui se"
--------------------------------------------------------------------------------
WER: 4.000000, CER: 0.777778, loss: 21.139561
 - wav: file:///mnt/extracted/data/lingualibre/lingua_libre/Q21-fra-French/Lyokoï/dissoudre.wav
 - src: "dissoudre"
 - res: "dix sous de en"
--------------------------------------------------------------------------------
WER: 4.000000, CER: 0.500000, loss: 22.396177
 - wav: file:///mnt/extracted/data/lingualibre/lingua_libre/Q21-fra-French/Lyokoï/héliocentrisme.wav
 - src: "héliocentrisme"
 - res: "il ou sentri sme"
--------------------------------------------------------------------------------
WER: 4.000000, CER: 0.444444, loss: 23.753244
 - wav: file:///mnt/extracted/data/lingualibre/lingua_libre/Q21-fra-French/WikiLucas00/psycholinguistique.wav
 - src: "psycholinguistique"
 - res: "si colin dust que"
--------------------------------------------------------------------------------
WER: 4.000000, CER: 1.333333, loss: 32.813377
 - wav: file:///mnt/extracted/data/lingualibre/lingua_libre/Q21-fra-French/Ltrlg/igname.wav
 - src: "igname"
 - res: "il y a main"
--------------------------------------------------------------------------------
WER: 4.000000, CER: 0.916667, loss: 32.839912
 - wav: file:///mnt/extracted/data/lingualibre/lingua_libre/Q21-fra-French/Ltrlg/endosmomètre.wav
 - src: "endosmomètre"
 - res: "en tant son maître"
--------------------------------------------------------------------------------
WER: 4.000000, CER: 5.000000, loss: 90.866600
 - wav: file:///mnt/extracted/data/lingualibre/lingua_libre/Q21-fra-French/WikiLucas00/AUC.wav 
 - src: "auc"
 - res: "adour bécon du pas"
--------------------------------------------------------------------------------
WER: 3.000000, CER: 0.166667, loss: 4.401402
 - wav: file:///mnt/extracted/data/lingualibre/lingua_libre/Q21-fra-French/WikiLucas00/ultratrifoliophile.wav
 - src: "ultratrifoliophile"
 - res: "ultra trifolio pile"
--------------------------------------------------------------------------------
Testing model on /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_test.csv
Test epoch | Steps: 182 | Elapsed Time: 0:07:01                                                                                                                                                                                                                                                                                                                                          
Test on /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_test.csv - WER: 0.261635, CER: 0.091464, loss: 28.857962
--------------------------------------------------------------------------------
WER: 3.000000, CER: 1.000000, loss: 26.121403
 - wav: file:///mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR/MonsieurLecoqP1C16_0188.converted.wav
 - src: "continuez"
 - res: "quand il est"
--------------------------------------------------------------------------------
WER: 2.333333, CER: 0.863636, loss: 134.887024
 - wav: file:///mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR/LesMysteresDeParisT1P1C5_0129.converted.wav
 - src: "diminution de fourloir"
 - res: "des minutions ou de fourrure a sa fin"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.142857, loss: 0.330683
 - wav: file:///mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR/LaGloireDuComacchio_0097.converted.wav
 - src: "pardieu"
 - res: "par dieu"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.142857, loss: 0.425681
 - wav: file:///mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR/LeComteDeMonteCristoT1Chap3_0284.converted.wav
 - src: "pardieu"
 - res: "par dieu"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.142857, loss: 3.226213
 - wav: file:///mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR/MonsieurLecoqP1C42_0070.converted.wav
 - src: "parbleu"
 - res: "par bleu"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.083333, loss: 4.601734
 - wav: file:///mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR/MonsieurLecoqT2P16_0185.converted.wav
 - src: "chanlouineau"
 - res: "chan louineau"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.750000, loss: 5.042370
 - wav: file:///mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR/madamebovaryC24_0123.converted.wav
 - src: "leon"
 - res: "et on"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.500000, loss: 5.537320
 - wav: file:///mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR/madamebovaryC33_0117.converted.wav
 - src: "emma"
 - res: "et ma"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.750000, loss: 6.044324
 - wav: file:///mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR/LesMysteresDeParisT3P5C12_0281.converted.wav
 - src: "cici"
 - res: "si si"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.300000, loss: 6.700687
 - wav: file:///mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR/LesMysteresDeParisT2P3C2_0146.converted.wav
 - src: "infortunee"
 - res: "un fortune"
--------------------------------------------------------------------------------
Testing model on /mnt/extracted/data/cv-fr/clips/test.csv
Test epoch | Steps: 150 | Elapsed Time: 0:03:11                                                                                                                                                                                                                                                                                                                                          
Test on /mnt/extracted/data/cv-fr/clips/test.csv - WER: 0.539678, CER: 0.290099, loss: 47.642025
--------------------------------------------------------------------------------
WER: 2.500000, CER: 0.666667, loss: 36.547375
 - wav: file:///mnt/extracted/data/cv-fr/clips/095a5dc93374276dc953c7f12cc592a556673bddd1527c07e31d3111190ba2d0dae86a439de47c0f5ab6de87d59118962c997e70e7926af2249e9ca4e5b80754.wav
 - src: "quel hurluberlu"
 - res: "que le ru des lus"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.200000, loss: 8.666916
 - wav: file:///mnt/extracted/data/cv-fr/clips/968bbfc905f703a9c2b8276695d7d9a96841cf2f4a5812224de5b9c350a320d5277e35bf5f294e639a1e9776e6c5c2c774285c88ffb58034c9d351bae38a288c.wav
 - src: "qu'importe"
 - res: "qui porte"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.625000, loss: 14.329972
 - wav: file:///mnt/extracted/data/cv-fr/clips/a75444979e340102e37ccacd9244f3391bb060497716b11c67159339b4b9c99179e4f33424b348905d615e3470e6bfffaf621ccc9441db79f170a2a91aef3d79.wav
 - src: "immacule"
 - res: "il manque"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.333333, loss: 14.587557
 - wav: file:///mnt/extracted/data/cv-fr/clips/890d482adb285fdfcdf1ad5e877d175be48cfe9544834136b7ef094f7252083211f6293c49b49bdc5fcbebeccc9b3eab11f0883358648441015127d2b05e902f.wav
 - src: "bienvenue"
 - res: "bien menu"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.142857, loss: 16.608137
 - wav: file:///mnt/extracted/data/cv-fr/clips/f9c8b113544051e108f937613aae962be609bc9c46d5f1745b0771e8600b1e3f55fe6a55fd07cedebc7f8dab64dbb7cf67dddf6c54b16d829935df56b580d1c6.wav
 - src: "comment"
 - res: "quand un"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.571429, loss: 17.221365
 - wav: file:///mnt/extracted/data/cv-fr/clips/e7fcb32b8cac6133b8f84803d6300f7035143dde2720549af428e2528a972eec51defa3564ee54f360f20d272fd9f8c37af4eaa2d3e2ddffb12f2015ac9ee212.wav
 - src: "anglais"
 - res: "un gris"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.636364, loss: 29.313414
 - wav: file:///mnt/extracted/data/cv-fr/clips/e7cfa56b14f04aa3ef3199fb21e9500e257a8c99784de8f785143007bacf3c17f1c64325235afe1112775b1385d0a15647d69594b6c012107266f8637b794cf8.wav
 - src: "defavorable"
 - res: "des anales"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.000000, loss: 30.782732
 - wav: file:///mnt/extracted/data/cv-fr/clips/599474bc32b257467c3b640e03408fca2544a77a9f29cae3c89ad13957a2a42ee8a0a3c27b4bc8f9c28b6be402629445bfbcd9971c334f6222fa5a891b760eda.wav
 - src: "salut"
 - res: "elle eut"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.875000, loss: 34.428288
 - wav: file:///mnt/extracted/data/cv-fr/clips/3ca27694f51a64c4edf113c9196563cde9411f2b85eca671396a3d858e499d8d3eb07ebf7fff59367bca2a512776b1f5c90202f26d5915f455dab5e1851bc95a.wav
 - src: "approche"
 - res: "abou mon"
--------------------------------------------------------------------------------
WER: 1.800000, CER: 0.477273, loss: 73.404541
 - wav: file:///mnt/extracted/data/cv-fr/clips/1af91fae2a27584d770530fa79ad3e8a88bc049e81939f123cfde2118c6fd600382fcc761d1ea873c13e22d3c1c2174423ca08ecac0a0034a85134cb7d3bc28c.wav
 - src: "les balades decouvertes permettent dobserver"
 - res: "il est malade de couler le rmat de verve"
--------------------------------------------------------------------------------
I Exporting the model...

Donc suivant les jeux de tests, entre 55% et 26% de WER.

Un petit exemple, j’ai dit « le mouton la vache le cheval » :