Ai2's olmo-eval is built for the iteration loop, not the final score — type0 | type0