Higher training batch size leads to higher final test loss

Hello,
I’ve been experimenting with different training batch sizes for fine tuning and consistently get higher final test loss for higher training batch size.

I always reset the starting checkpoint to the officially released one at the beginning, use the same training/validation/test data sets and only change the training batch size, for each batch size, the experiment was repeated 3x.

The avg test loss for training batch size=2 was 23.8 and for batch size=48 it was 26.08. Avg test loss was monotonically going up when trying batch size 2,4,8,16,32 and 48.

Have you seen a similar effect when choosing the default training batch size=24?

Has anyone had similar differences when trying different batch sizes for the training?

Generally we’ve kept more-or-less the same batch size so we’ve not seen this effect.