In Adyghe language, the western and eastern dialects vary, should they be separated? Would that effect negatively on DeepSpeech, can DeepSpeech be taught multi dialects in a single dataset?
My take on this:
1- I would rather keep them together in order to concentrate efforts in one direction.
2- I want DeepSpeech to be able to recognize the language with multi dialects, without the need to specify which dialect it should listen to in order to do that.
Let’s assume that the answer is No, meaning “No they should be separated”, then can we still collect them in a single data-set, and flag the sentences with their associated dialect, that will make it easy to separate them later if needed for DeepSpeech. Also that will enable us to add even more dialects, because the eastern and western dialects have their own inner dialects as well.