The core of my question is: is the main part of a lingual neural network about handeling all the exceptions and irregularities of a language or is it more about handeling all the different voices and accents? Which one is the harder problem that requires more data? I can guess that since no one has ever tried machine learning with a completely regular constructed language there is no answer to this question yet. But I would guess that regularity could simplify things drastically.
About the other things @ftyers mentioned:
Diversity of speakers – Esperanto speakers have quite diverse accents (as there are nearly no L1 speakers, you get transfer from the L1)
I would say there are quite a few L1 speakers, around one thousand persons. But what makes Esperanto unusual is that in my experence native Esperanto speakers sometimes tend to have stronger national accents than someone who has learned Esperanto as a second language. I guess this is because there is no Esperanto school and they only use Esperanto in their family most of the year and only see the esperanto community on congresses. This leads to a situation where the nation of origin is more important in the question of accents than the question weather someone is a native speaker or L2. I know complete beginners of Esperanto who pronunce every word perfectly and could contribute to this dataset much better than some native speakers.
Number of speakers – If you only have 10 speakers of a language and have recordings of all of them, your system is going to work better than if you have 10 million speakers and only recordings of 100.
The commitment to contribute to projects is much bigger in the esperanto community than in other groups, one can clearly see this on things like the esperanto wikipedia. Languages with comparable numbers of speakers have much smaller numbers of articles. I guess if we advertise Common Voice more in Esperanto magazines, congresses and websites we could get a pretty good coverage, 1 000 of the 2 million L2 speakers seem possible to me. Right now there are contributions from over 140 people in Esperanto wich is already quite impressive for a constructed language.