I was checking the code. In util/feeding.py file, data is sorted in ascending order by wav_filesize, that is shorter audios will be processed first. I have a large data of audios with duration from 0.5 sec - 24 sec. How could the order of the data affect the training loss?
I am trying now a run with descending order, and already noticed the loss is better… but maybe the initial random weights is the reason.
Ascending/train (2 epochs):
Descending/train (1 epoch):
What is the idea of the sorting anyway?