DRAFT GUIDELINES FOR REVIEWING UPLOADED TEXTS
A proposal to improve the Review Sentences paragraph currently found on this page: https://common-voice.github.io/sentence-collector/#/how-to
[Edited to include comments up to 8 July 2019]
Make sure the uploaded text meets the following criteria:
- It must be spelled correctly (though slang spellings are allowed).
- It must make sense, and be grammatically correct and self-contained.
- It must be easily speakable, and in the correct language.
If the text meets the criteria, accept by clicking the “yes” button.
If the text does not meet the criteria, reject by clicking the “no” button.
If you are unsure, click the “skip” button to move on to the next one.
Examples:
Although the website is misleadingly called the “Sentence Collector” don’t worry too much whether the text meets the formal definition of a sentence. For example, it’s not necessarily a problem if the text does not include a verb. Any phrase that you could imagine being used as a caption to an image should be OK.
Reject texts with typos or accidental spelling or grammatical errors, but accept texts with slang terms and apparently intentional spelling variations. Before rejecting for spelling errors, remember that alternative spellings might be normal elsewhere in the world.
The giant dinosaurs of the Triassic.
The giant dinosaurs of the Triassic
[lack of full stop/period at the end is not considered an error]
The Giant Dinosaurs Of The Triassic.
[Accept unconventional capitalisation - it could be intentional in some contexts]
The giant dinesaurs of the Triassic.
[Spelling error]
“The giant dinosaurs of the Triassic.
[Punctuation error]
The giant dinosaurs of.
[Not grammatically correct and self contained]
The giant dinosaurs if the Triassic.
[Obvious typo for ‘of’]
Is that to many potatoes for you?
[Obvious typo for ’too’]
she said, after a pause.
[Not grammatically correct and self contained. Appears to be only part of a sentence as it starts with a lower case ’s’.]
April is the cruellest month.
[Normal British English spelling]
April is the cruelest month.
[Normal US English spelling]
Are ya gonna hit ‘em?
[Slang and unconventional terms are OK]
It was the womans bag.
[Should formally be “woman’s”, but many people now intentionally omit the apostrophe, particularly in informal contexts. Accept, as we need to capture informal as well as formal usages]
The B-B-C is a British broadcaster.
[Misuse of “B-B-C” in order to avoid the prohibition on the usual abbreviation ‘BBC’]
Joyeux Noel.
[Not in the expected language. This French text has probably been uploaded to the wrong language section]
Deinococcus radiodurans is a species of bacterium.
[Not easily speakable; too obscure and difficult for many readers]
“nuqneH”, the Klingon Captain said.
[Not in the expected language, and not easily speakable]
I’m driving my pizza with an elephant on my cheese.
[Reject meaningless texts, for example those that appear to have been computer-generated]