The Era of AI Heroism Is Over
Yao Shunyu trained Claude 3.7, Claude 4.5, and Gemini 3. His message from inside two frontier labs: the next gains come from disciplined engineering, not the next Transformer.
The era of the lone genius cracking open AI is over. The next generation of frontier models is being built the way Boeing builds an airplane: by teams of engineers running coordinated checks on each other's work, until the thing flies often enough to ship.
That is the structural claim Yao Shunyu has been making since he spent the last few years inside two of the most consequential labs in the field, Anthropic, the AI safety company behind Claude (where he worked on Claude 3.7 and Claude 4.5), and Google DeepMind (where he worked on Gemini 3). In a roughly four-hour interview on the Zhang Xiaojun podcast, episode 140, Yao argued that the pre-training scaling law has not broken down, and that the Transformer was the last great individual heroic breakthrough. What comes next is collectivism: many engineers, many checks, many slow iterations in pursuit of one scarce trait, reliability.
"I don't think there's a scaling wall," Yao said, according to the 36Kr English summary of the interview. "What we need now is engineering discipline, not another Transformer."
That framing matters because it tells builders, investors, and policymakers what to actually optimize for. The wire's version of this story is a single researcher's thesis that "the scaling wall might not exist." That is a defensible claim but a thin one. Yao's deeper argument is structural. He is describing how frontier AI development now works in practice, and what that shift means for everyone trying to ship a model that does what it is supposed to.
Yao came to AI by an unusual route. He has a BS in physics from Tsinghua University and a PhD in theoretical high-energy physics from Stanford, then moved into machine learning. That physics background shows up in how he talks about models: as systems whose behavior emerges from how the parts are arranged and constrained. In a personal blog post dated October 6, 2025, he announced his departure and laid out what he had learned at the two labs.
The most counter-intuitive thing he learned is that individual brilliance is no longer the bottleneck. The Transformer, the 2017 architecture that underpins essentially every modern large model, was the work of a small team at Google that got the structure of attention right in one shot. Yao's argument is that nothing comparable is going to arrive on a similar schedule. Future gains will come from many engineers making the model behave more reliably, not from a single researcher's flash of insight.
"Reliability, not exceptional intelligence, is the trait that scales in this new era," Grace Shao paraphrased Yao as saying in her AI Proem newsletter. She and the 36Kr English team are among the only English-language outlets to publish substantial excerpts from the interview.
Concretely, this looks like what Yao calls "intent-level differentiation" and what his interviewers describe as the slow grind of getting a frontier model to do the right thing on the inputs the team cares about. Public benchmarks have converged. Most top models cluster within a few percentage points of each other on the standard evals, a phenomenon covered as "benchmark homogenization" in the wider AI evaluation discussion. What still varies wildly is how each model behaves on the long tail of cases that real users actually hit. The work of closing that gap is engineering work, not discovery work. It is also expensive and slow, which is one of the reasons the frontier-lab advantage compounds.
There is a counterargument here, and Yao does not hide from it. If future gains come from disciplined engineering rather than from research breakthroughs, then the people who can afford to run disciplined engineering at scale (capital, talent, and compute) will pull further ahead. The capability gap between the top three or four labs and everyone else is not closing. A separate Yahoo News piece, republishing South China Morning Post reporting on Anthropic's anti-China positioning, gives a different angle on the concentration question. Yao's structural claim is consistent with that picture: when the scarce input is engineering capacity, the labs that already have it stay ahead.
The implications for buyers and deployers are concrete. If the next round of model gains will be about reliability rather than raw intelligence, how teams buy models changes too. The question becomes which model is the most predictable on the cases they care about, and which lab will still be shipping improvements in 18 months. For builders, the optimization target moves from architecture novelty to evaluation infrastructure, failure-mode analysis, and the boring, expensive work of making a system do what it is supposed to do consistently. For investors, the bet is on the lab that can run the most disciplined engineering loop.
The strongest caveat is that this is one researcher's read. Yao has unusual standing. He trained models at Anthropic and DeepMind, and his personal blog post and the full podcast episode on YouTube are publicly available for verification. But the claim that "the scaling wall might not exist" is his interpretation, based on his training experience, not a measured industry consensus. Outside his two labs, the picture is more mixed. Several recent model releases have shown smaller jumps in raw capability than the previous generation, and the question of whether that reflects a true plateau or a temporary difficulty in harvesting the next round of compute remains open.
The thing to watch is not a single paper or benchmark. It is whether the next wave of frontier releases emphasizes reliability metrics, intent-level evaluation, and deployment robustness over headline intelligence scores. If they do, Yao's structural claim is borne out by the market. If they do not, the era of individual heroic discovery may be back on the table.