Anthropic's new research paper, Agentic Coding and Persistent Returns to Expertise, is the largest behavioral study of AI coding tools to date. The headline finding should make a lot of career planners uncomfortable: the people who finish the job are not the people who can code best. They are the people who know their field best.
Across roughly 400,000 sessions from about 235,000 users of Claude Code, Anthropic's AI coding assistant, between October 2025 and April 2026, the gap between domain experts and software engineers was narrow. Non-developers, including managers, salespeople, designers, and operations staff, hit a verified-success rate of about 29 percent. Software engineers hit about 34 percent. Five points separate the two groups.
That small gap is the story. The mechanism the paper describes makes it durable: in an AI-first workflow, the model handles implementation, the loops, the boilerplate, the error handling. What it cannot do is decide which problem is worth solving, which trade-off is acceptable, or which output means "ship it" versus "throw it away." Those judgments require domain expertise, not code fluency. A salesperson can tell whether a draft pipeline is generating qualified leads. An operations lead can tell whether a workflow fix will break the upstream queue. A coder who knows neither field can write clean Python and still miss the point.
The paper's most counterintuitive finding is the shape of the returns. Diminishing returns set in early: the jump from "no idea what I'm doing" to "I can mostly tell if this is right" captures most of the value. The extra lift from "very good at this domain" to "world-class expert" is small in success-rate terms. That has real implications for how people should think about AI training. A six-month deep dive beats a six-year shallow one.
A few honest caveats apply. The data comes from a single vendor's users, and Claude Code's audience self-selects toward people willing to pay for and wrestle with an AI coding tool, a group that probably over-indexes on tech-adjacent professionals. The paper does not define "verified success" with the precision outside reviewers would want, and Anthropic has commercial reasons to frame the result as broadly good news for AI coding. The 29 percent number is worth sitting with: more than two-thirds of attempts by domain experts did not land. And the gap could close, widen, or invert once competitors publish comparable numbers. Cursor, GitHub Copilot, and Devin have not released similar behavioral data, and Anthropic's pattern may not transfer.
The most uncomfortable implication is what happens when domain expertise is shallow. If the model is the junior engineer and you are the senior reviewer, garbage in still produces garbage out, just with more confidence and better formatting. The people who do worst with Claude Code are not the people who lack coding skills. They are the people who lack any field they know well enough to judge the output.
Management and sales professionals are among the fastest-growing user segments in Claude Code, according to a summary of the research that circulated when the paper dropped, a pattern consistent with the paper's mechanism: as the implementation cost falls, the people with the most domain knowledge and the least tolerance for engineering overhead show up first.
The next data point worth watching is whether competing tools tell the same story. Anthropic has published the largest behavioral sample of any AI coding vendor to date. If Cursor or Copilot post comparable numbers with a similar gap, "domain expertise is the moat" becomes the consensus view of AI coding. If they don't, Anthropic's result is a product-specific quirk. Until then, the working hypothesis from the data is that the path into AI-assisted work runs through whatever field a person already knows, not through retraining as an engineer.