Why a 24% Score on a Reasoning Benchmark Is an Argument About Compute — type0 | type0