Think-Anywhere: A Peking University and Alibaba team found that rewarding LLMs for pausing mid-token produced a 9.3 point jump on code generation benchmarks — type0 | type0