Hello Team,
Thank you for publishing a detailed paper on the Corpora creation.
In the paper, it is mentioned that the splits for Train, Test and Dev were done in such a way that one speaker’s recordings are only present in one data split.
““We made dataset splits (c.f. Table (2)) such that one speaker’s recordings are only present in one data split.””
I want to know, is there any way if the same can be applied for any other open-source corpus? If so, could you please point me to the code.
Thank you.