What if people are using text-to-speech to record?

I’ve come across recorded sentences with text to speech. Should I vote them positively or not?

Hi @DaDiRa

Welcome to the community discourse! :slight_smile:

This is interesting, I haven’t found this situation, did you have the chance to document which sentences were using this?

I would say that this is not ideal, since this is the same voice over an over again, so having more than 15 minutes of this voice is not super helpful. We really need at least 1000 different and diverse voices for each language, and definitely this is not very diverse.

2 Likes

No I didn’t document it but I’ll do it from now on. This happened about 3 times so far in the 55 sentences I voted for.

I’m pretty sure this is something we already discussed about with @kdavis and the answer was a clear no as much as I can recall. Not only it’s going to not be very good for the dataset, but chances are that this is against the terms of use of the Text-to-Speech service.

2 Likes

If the voice is indeed synthetic the clip should be marked as invalid, and I agree with @nukeador that…

1 Like

I’ve added this to the draft reviewing guidelines, here:

1 Like

Hi, so people are recording TTS clips, I’m rejecting them since it doesn’t make sense to have them in the dataset. I’m worried this will slow down the validation process of actual clips.

@Codigo_Logo_Programacao_e_Inteligencia_Artificial how many of these have you found?

@nukeador About 10 in a set of 150 clips.

@gregor is there a way we can help people identify and flag these ones so we can identify who is sending these?

Unfortunately we don’t have flagging functionality yet, though it’s been requested a couple of times already.