Frontier AI Can Write Code Now, But It Still Can't Count the Apples in a Photo

Frontier AI Can Write Code Now, But It Still Can't Count the Apples in a Photo — type0 | type0

PREVIEWFrontier AI Can Write Code Now, But It Still Can't Count the Apples in a Photo · MD

Frontier AI can now write code about as well as a mid-level engineer. Show it a photo of five apples in a bowl and ask it to count the fruit, and the model will confidently answer six, or three. That gap, not the benchmark leaderboard, is the most honest story in AI this week.

On Tuesday (June 9), Anthropic released Claude Mythos 5 and Claude Fable 5, two variants of the Claude Mythos Preview model it first announced roughly two months ago. The new models continue the recent trend of meaningful coding gains. They also close what analyst Timothy B. Lee calls the image-understanding gap with OpenAI, with Fable 5 arguably slightly better than GPT-5.5 on vision tasks. The catch-up is real. The more revealing fact is that both companies are still mediocre at the same kinds of problems, and a third frontier lab, Google, is not closing the gap on either.

Lee, who writes the Understanding AI newsletter, spent hands-on time testing the new Anthropic models. His thesis is straightforward: this year's frontier models solve image problems that stumped last year's models, but only modestly, with lingering geometric and spatial errors. Frontier vision used to mean being able to describe a scene at all. It now means being able to describe a scene correctly, most of the time, when the scene is not too cluttered. The improvement is real. So is the ceiling.

That ceiling deserves attention precisely because vision is the laggard. On coding, math, and structured reasoning, frontier models have moved fast and consistently. On image understanding, they have moved slowly and unevenly. The same scaling playbook that produced reliable gains in code has produced unreliable gains in vision. A reader who notices that asymmetry will be better positioned to evaluate the next round of vendor announcements, because the asymmetry is structural, not a matter of who has the better benchmark sheet this quarter.

The Anthropic release itself underlines the point. Mythos 5 stays gated to a handpicked set of organizations under a program Anthropic calls Project Glasswing, with relatively unfettered access. Fable 5 is generally available, but routes automatically detected dangerous requests (hacking prompts, bioweapon-design prompts) down to Claude Opus 4.8. The safety routing is the giveaway: vision is now treated as routine enough that dangerous use cases get filtered, while coding remains the capability vendors most want to keep improving. The routing structure tells you which modality the labs think they have under control and which one they are still trying to lift.

There is a useful heuristic here for anyone tracking the field. When a new frontier model ships, look first at its image-understanding scores, and only second at the press release. Coding benchmarks will continue to climb, and they will continue to be the headline number because they convert cleanly to revenue. Vision benchmarks will continue to be quieter, harder to game, and more honest about what "general intelligence" actually means. A model that writes beautiful Python but counts five apples as six is not a step toward general intelligence in any meaningful sense. It is a step toward a better coding assistant.

The real story of this week's release is not that Anthropic caught OpenAI. It is that catching OpenAI on vision is itself a modest achievement, and that the gap to human-level visual reasoning remains large enough to discipline any compressed AGI timeline. Vendors will keep announcing parity. The apples in the photo will keep being miscounted. Both of those facts can be true at once, and the second one is the one that matters.

Frontier AI Can Write Code Now, But It Still Can't Count the Apples in a Photo

Sources