I have been thinking for a bit about ways that the clips presented for review could be pre-screened so that either the good or the bad could be separated out before they’re actually sent to people to listen (eg the syllable check idea mentioned in passing here, which I’ve yet to get round to looking into further!)
It seems like a script that was recently posted in the Mozilla TTS repo might be of use for exactly this kind of pre-screening - it analyses signal to noise ration (“SNR”): https://github.com/mozilla/TTS/blob/master/dataset_analysis/CheckDatasetSNR.ipynb
If that were able to be automated it could identify the worst clips which would not be worth sending for human review.
I’m thinking this would pick up cases where the mic was barely working - it’s still not quite as good as giving someone more immediate feedback as mentioned here but it would at least save needless review.
Even if it wasn’t used as an absolute decider of quality (ie to say that a recording definitely wasn’t of use) it might still give value in letting the priority of review be determined: it makes sense to first focus on the cases which don’t have a relatively bad SNR, even if you then later come back to the more borderline cases later. If using it this way, you wouldn’t necessarily need it to run interactively, it could be batched to run on a load of clips and prioritise them.
Any thoughts?