Hi, Iām a bit late to the party here, but Iād like to offer a linguistās point view.
TLDR; we should not crowdsource these definitions. Incorporate academic resources instead.
- Writing is a technology that has to be invented recently.
- Native speakers universally acquire their native language.
- A natural language has an internally consistent phonology.
- Spoken variations for continuums; division into ālanguagesā are sometime political or historical.
- The official version of a language is often highly codified, constructed, and āunnaturalā (far from spoken varieties).
Example: Norwegian is a language group, the varieties are largely mutually intelligible with each other, and with Swedish. The two formalised standards are BokmĆ„l (which half-jokingly is a koineized version written in Danish), and Nynorsk (an imaginary proto-version), both of which no-one āreally speaksā.
Comparable situation with Finnish. The official version that has a standard is an invention by amalgamating features from natural varieties, itās highly constructed (though can be spoken).
Example: dialect continuum.
Consider:
ENGLISH: I am the son of my father and my mother.
SCOTS: A am the son o ma faither an ma mither.
FRISIAN: Ik bin de soan fan myn heit en myn mem.
DUTCH: Ik ben de zoon van mijn vader en mijn moeder.
Consider:
The Balkan example mentioned above.
What are the recommendations then?
-
For high-resource languages that have standard bodies, the meta-data should designate speaker status of whether they are producing the standardised variety, e.g. a ānativeā English speaker, who can either use the General American, or Standard Southern British
-
For regional varieties, the meta-data should designate native speakers of a variety, as defined by widely established dialectology.
-
Non-native speech should be labelled as such. There are varying levels of āaccentednessā, from highly consistent L1-interference (in this case, you may say that the speaker has created a merged internal phonology in the process), to rampant lexical errors (e.g. using wrong tone or quantity as a result of having no control over phonemic contrast).
Now in terms of ASR, conventionally there are two models: the acoustic models and the language model. At some point it may be helpful to also have a separate phonology model: e.g. which phonemes can occur together, how they change into allophones in different contexts, or in the case of non-native phonology, substitutions etc.