Training on Common Voice

So I am training on an AWS EC2 instance with p2.xlarge.
The GPU consumption is full. It’s a 12G K80.
The average training time is 6/7 hours. Is that fine?
The error seems to be increasing right now in the initial epoch but what is the optimal number of epochs required?
I am training on the Common Voice EN dataset.