Higher training batch size leads to higher final test loss

yv001 · May 20, 2019, 3:15pm

Hello,
I’ve been experimenting with different training batch sizes for fine tuning and consistently get higher final test loss for higher training batch size.

I always reset the starting checkpoint to the officially released one at the beginning, use the same training/validation/test data sets and only change the training batch size, for each batch size, the experiment was repeated 3x.

The avg test loss for training batch size=2 was 23.8 and for batch size=48 it was 26.08. Avg test loss was monotonically going up when trying batch size 2,4,8,16,32 and 48.

Have you seen a similar effect when choosing the default training batch size=24?

yv001 · May 28, 2019, 7:33am

Has anyone had similar differences when trying different batch sizes for the training?

kdavis · May 28, 2019, 8:18am

Generally we’ve kept more-or-less the same batch size so we’ve not seen this effect.