Why Telling an AI It's Wrong Doesn't Actually Make It Smarter

PREVIEWWhy Telling an AI It's Wrong Doesn't Actually Make It Smarter · MD

A controlled study of 13 open-weight language models across math, code, and abstract-reasoning tasks finds that an AI agent which critiques its own answers barely improves on a second attempt. The real gains come from external feedback, and only when that feedback comes from a demonstrably stronger model. The work, posted to arXiv on June 29 as "What Drives Interactive Improvement from Feedback?", isolates a structural bottleneck: not whether a model receives feedback, but whether it can generate feedback worth acting on.

The author's project page describes a 13-by-13 student-teacher matrix across four verifiable reasoning environments: Omni-MATH for competition math, Codeforces-style competitive programming, the BBEH Linguini sub-benchmark for compositional reasoning, and ARC-AGI1 for abstract visual puzzles. "Open-weight" here means the model weights are publicly downloadable, the opposite of closed proprietary systems such as GPT-4 or Gemini. Each setting pairs a "student" model attempting a task with a "teacher" model that can comment on its solution, with K=10 interactive attempts per setting. Side by side, the protocol compares external teacher feedback, self-generated feedback, and unguided self-refinement, while varying interaction history, task difficulty, and whether the teacher has access to privileged task information.

Across this grid, multi-turn improvement is often not evidence of feedback use at all. Self-generated feedback adds little beyond unguided self-refinement. The strongest external teachers, however, add roughly 9.2 to 16.6 acc@10 points over the self-refinement baseline across the four environments, a gap the paper attributes to information the teacher contributes that the student could not have surfaced on its own.

That gap reframes the standard industry assumption that more feedback rounds equals more learning. The productive question is no longer "does feedback help?" but "can this model generate feedback a student can use?" The bottleneck is on the generation side, and the same paper notes that a student's ability to exploit strong feedback varies substantially across model families, suggesting feedback reception is a learned skill, not a default capability.

The work is a preprint and has not been peer reviewed. It evaluates only open-weight models; transfer to closed-weight frontier systems or to open-ended generation is not established. A closely related paper from the same submission window, "Contrastive Reflection for Iterative Prompt Optimization," targets prompt optimization via reflection and reaches a different question. The two are complementary rather than overlapping, but readers tracking self-improving-agent claims should treat both as the authors' controlled findings rather than consensus.

For RLHF practitioners, agent builders, and benchmark designers, the diagnostic is sharp: stop counting feedback rounds, and start measuring whether the feedback channel is informative for the specific student model in the loop.

Why Telling an AI It's Wrong Doesn't Actually Make It Smarter — type0 | type0

Why Telling an AI It's Wrong Doesn't Actually Make It Smarter

Sources