Why a 24% Score on a Reasoning Benchmark Is an Argument About Compute — Markdown | type0 | type0