In this thread I want to discuss a bit the meaning, the pros and cons of the following sentence contribution policy:
“be nice, don’t use offensive language. we aren’t collecting that kinda material.”
Is it really that you don’t want to collect this kind of material or is it rather that you don’t want to present offensive sentences to our readers and verifiers? I can perfectly understand the latter and I support this argument, especially in regard to the variety of cultural backgrounds and ages.
However, in reality people will want to use their speech recognition software in any way they want. Especially if they run the software locally and trust it that their wording will not be saved in the cloud. This includes offensive and also pornographic language. In case you need an eye-opener: The little German search engine DeuSu (which advertises privacy) publishes an uncensored list of the 100 mostly used search keywords every year: https://deusu.de/blog/2016-11-22-wonach_deutschland_in_2016_wirklich_gesucht_hat.html
I was surprised by myself. So people will feel censored if the software works well, but always fails identifying words of a certain category.
This is why I have tried to include some of such words into my contributions (as you probably have noticed), but put them in an innocent context. For example, although often used in a different context, the actual meaning of “cock” is “rooster”. And indeed, this already has triggered some discussion:
I really like that this happens publicly, by the way. That’s less black-box testing for me.
I’d like to read some opinions on this topic, from the Mozilla guys, as well as from other contributors.