OpenAI Finds Reasoning Models Can't Hide Their Thinking
OpenAI has released research showing that today's reasoning models are remarkably bad at controlling their own chain of thought, and that weakness is actually a safety feature.

OpenAI Finds Reasoning Models Can't Hide Their Thinking — And That's Good for Safety
OpenAI has released research showing that today's reasoning models are remarkably bad at controlling their own chain of thought. That weakness, the company argues, is actually a safety feature.
The study, published on OpenAI's research blog, introduces CoT-Control, a benchmark designed to test whether reasoning models can deliberately reshape or obscure their internal reasoning when prompted. The answer: mostly no.
The benchmark includes over 13,000 tasks built from established datasets including GPQA, MMLU-Pro, HLE, BFCL, and SWE-Bench Verified. Each task pairs a problem with an instruction about how the model's chain of thought should look — for example, using only lowercase letters, avoiding certain keywords, or appending a specific word to each sentence.
Tested across 13 reasoning models, the results were striking. CoT controllability scores ranged from just 0.1% to 15.4% — meaning models failed to reliably control their reasoning in the vast majority of cases.
"Models often recognize in their CoT that they're failing to comply with the CoT instructions they were given and — even after this realization — still continue to fail," OpenAI noted.
The company framed the finding positively: low controllability means higher monitorability. If models can't easily hide their reasoning, safety systems can better audit their decision-making.
However, there are caveats. Controllability does increase with model scale, and improves slightly when models know they're being monitored. But it decreases with longer reasoning chains and additional post-training, including reinforcement learning.
OpenAI says it will start reporting CoT controllability alongside monitorability in system cards for future frontier models, beginning with GPT-5.4 Thinking.
Sources
- openai.com— OpenAI Blog
- openai.com— OpenAI Blog
- openai.com— OpenAI Blog
Share
Related Articles
Stay in the loop
Get the best frontier systems analysis delivered weekly. No spam, no fluff.
