Hi, I am training Deep Speech model for Nepali language. The audio transcription consists of around 150 hours of .wav files. Can anyone recommend me the number of hidden layers, epochs and such other parameters that i should be using to get an optimized result for such small dataset?
You should be using transfer-learning2
branch, that is scoped for this usecase. Maybe @josh_meyer can give more input.