Problem with some French sentences

Hello everyone,

When recording or validating sentences through the interface in French, I stumbled upon a couple of sentences with spelling mistakes, and some others which weren’t in French at all.

I guess there is a problem somewhere. Where or how should I report these sentences, the next time I come across them?

Thanks!

1 Like

It’d be great if you added some samples. Sentences are in the text files: https://github.com/mozilla/voice-web/tree/master/server/data/fr

Thanks!

At the time I came across them, I didn’t think at all about writing them down somewhere to report them later, but I’ll definitely do that the next time it happens.

I’ll have a look at the link you gave to see if I can find the ones I stumbled upon.

1 Like

I found again two examples not in French, line 4391 and 4392 of this file:

Relijion gozh ar Gelted.
Kredennoù kozh ar pobloù amerindian

These are the two I came across when validating sentences, maybe there are others.

I sent a pull request to correct the issue.

1 Like

Thanks for sending a PR !

In another adresses, there are some names which are very localized. Only people who live near this adress can read the name correctly. (Oyonnax, Werendeheim…)
And some short name aren’t readable clearly like : RTE, CH. …
It will be perhaps useful to add a button with “wrong sentence” ?

Hi François,

I don’t think we should exclude rare words, on the contrary.

When I’m not sure how to pronounce a word I’m not familiar with, either I pass the sentence, or I try to guess the pronuncation. In some instances I also looked for the pronunciation in Wiktionary, but it requires to know the International Phonetic Alphabet.

As for short names, I guess that “RTE” is an acronym; I don’t know how acronyms should be dealt with, but my guess is that we should simply read the letters one by one when we come across one.

That being said, I think that a button to report problematic sentences may be a good idea; but I think it requires to display some recommandations about what kind of sentences should be reported (e.g. spelling mistakes).

I think we should already just have a look at the skip rate of some sentences.

We have addresses from all departements of France.

It’d be good we get proper feedback on those. RTE for instance, I think it’s fine and should be spelled ; for « CH. » I’m not so sure.

In addresses, RTE could be for “route” but it’s also an accronym for “Réseau de Transport d’Électricité” and it’s OK to pronounce the letters separately. They own the high-voltage electric lines in France. CH is probably for “chemin”, so I think it should be read as “chemin”, not “C.H.”. There are also a few instances of “Imp.” for “impasse”.

Honestly, I did not even thought about route, but for me it was the acronym. Anyway, we should avoid and remove those ambigus points. At first on https://github.com/Common-Voice/commonvoice-fr and then update Common Voice website.

Any help on that is welcome, there’s some tooling in the repo https://github.com/Common-Voice/commonvoice-fr but it’s not perfect.