I am training end to end model for Hindi. For each epoch, the training loss increases from a smaller value. But the overall trend of the loss seems to be decreasing over training epochs. Is this behaviour defined?
Also, Is is because of the training utterance lengths (related to sortagrad and stuff)?
Here is the plot from tensorboard