Outdated Mass Submissions

Whoever has uploaded the entirety of Alice in Wonderland or the detective novel (Hanaud) has basically created a massive amount of repetition of the same words. To make matters worse, they’re not modern speech!

I have over 3 thousand pages of repetitive sentences to downvote, I sit for hours clicking away to find that there’s still over 3 thousand. The bloody page counter has overflowed!

This is ridiculous. We need to add modern words and names, but they’re hidden under a pile of auto submitted waffle.

Hi @ajay.dixon

I understand your frustration, I’ve seem myself in the same situation when checking Spanish sentences in the past.

It would be interesting to understand what would you need when these situations happen, or how do you think we could avoid these situations to happen in the future.

BTW, can you paste an example of “repetitive words”? The tool should be blocking any duplicated sentences.

I can’t really paste any, I just soldiered on and got past a lot of them. A lot of the repetition was coming from short conversions in novels.

For example:

“It was”, Alice said thoughtfully.
“Was it”? Said Alice thoughtfully.

Bertie, “Oh no”.
“No, oh well Bertie”.

The variations of these sentences seemed endless. I was wondering if I should be down voting for this anyway.

As long as they are valid sentences they should get positive votes. I don’t think there is an easy way to detect sentences that are similar but not the same.

Repeated words or short utterances, strung together in a variety of ways, are crucial to the corpus. That’s the only way in which the algorithm can learn the grammatical and audio context of the manner in which those words are used by native writers and speakers.

They only use one recording of a sentence in the model, so we need as many unique sentences as possible, even if the words are similar.

Yeah. I know, it was just reaching the point where I was thinking ‘haven’t I up voted this phrase multiple times’? That’s all.

I upvoted nearly everything.