Alright, I’ve spent part of the morning looking into this. It’s going to require a little more time to digest this. I can certainly report on what the evaluation matrix is doing/intending to do. It looks like they’re using two different (with about 3 subsets of BLEU) evaluation methods and that users are able to submit their systems to have the site perform an automated BLEU or TER evaluation for specific language pairs. With BLEU, the higher the score, the better the evaluation.
Anyway, that’s all I’ve gleened so far. Will report back more when I’ve learned more.