The thing you cannot automate is the thing you cannot verify.
That is the core proposition of "Some Simple Economics of AGI," a 112-page paper by Christian Catalini of MIT's Crypto Economics Lab, Xiang Hui of Washington University, and Jane Wu of UCLA, posted to arXiv on Feb. 24, 2026. As automation costs fall toward zero, the argument goes, human verification bandwidth becomes the binding constraint on economic growth — not intelligence, not compute, not model quality. The scarce resource is biological. The verification bottleneck is real, and it is structural.
This is not a comfortable framing for an industry that has spent the past two years promising that AI agents will transform productivity. The paper's real punch is what happens when verification fails to keep pace: unverified deployment becomes privately rational for every individual actor, even as it accumulates systemic risk across the whole economy. The authors call this the Trojan Horse Externality.
On an a16z podcast in March 2026, in conversation with Eddy Lazzarin, the venture firm's crypto CTO, Catalini described the outcome the paper warns against: an economy where nominal output grows while the apprenticeship loops that produce future experts quietly atrophy. Left unmanaged, those forces pull toward what the paper calls a Hollow Economy — explosive top-line metrics, decaying human agency underneath. The alternative — an "Augmented Economy" where verification scales alongside agentic capabilities — requires deliberate institutional investment that the market will not spontaneously produce.
Before going further: Lazzarin runs a16z's crypto practice. Catalini co-founded LightSpark, a blockchain payments company. Their framing of verification infrastructure as the next bottleneck has money behind it. The paper's own logic — that misalignment accumulates silently until it doesn't — is the right lens to apply to the paper itself.
The paper's most precise contribution is the Measurability Gap. Tasks split into two categories: those where the gap between execution and verification is small enough to close with tools, and those where it isn't. As the paper puts it, this structural asymmetry widens between what agents can execute and what humans can afford to verify. As automation improves, the measurable tasks get automated first. What remains — the verification layer, the judgment calls, the exceptions — is the gap itself, and it is measured in human attention, which does not scale.
The paper formalizes the mechanism by which that gap gets wider. The Codifier's Curse describes what happens when a top expert trains an AI on their own decision-making: they automate their own displacement. Junior analysts label data that displaces junior analysts. The apprenticeship loop — the process by which organizations reproduce their own expertise — gets disrupted from inside. The authors call this the Missing Junior Loop. Automating entry-level cognitive work destroys the pipeline that would have built the next generation of verifiers. The Hollow Economy is what you get when these dynamics compound: explosive nominal output, decaying human agency, unverified deployment at scale.
The paper's most concrete empirical anchor is not from Catalini's own analysis. It is from Google's DORA team, which has been tracking software delivery metrics across thousands of engineering teams since 2019. The DORA 2025 report finds that increased AI adoption in coding correlates with lower software delivery stability — not higher velocity, not better outcomes, but more instability. The paper's interpretation: teams are shipping AI-generated code they lack the bandwidth to review properly. Technical debt accumulates before anyone sees it. The paper cites this as evidence of what unverified deployment looks like at scale.
This is the Hollow Economy made visible. One empirical data point does not settle the case — DORA measures correlation, not causation, and delivery instability has many causes. But it is the kind of grounding that the rest of the paper, which leans heavily on economic theory and thought experiments, genuinely needs.
The paper also cites Ju and Aral (2025), who found that AI assistance increases output volume by roughly 50 percent per worker while causing what the authors call a "sharp diversity collapse" in creative content — producing homogeneous, self-similar work that, at scale, risks degrading the aggregate value of what gets made. The paper additionally cites work by Botelho and Wang (2026) on symbolic compliance in LLMs — a phenomenon where agents learn to satisfy surface-level fairness metrics while violating the deeper intent those metrics were meant to encode. That citation could not be independently verified; the Catalini paper is the only source referencing it. Both findings are offered as evidence that scaling automation without scaling verification does not merely fail to improve outcomes — it actively degrades them in ways that are hard to detect.
The paper's proposed cure is a firm topology the authors call the AI Sandwich. The top layer is a human Director, defining intent for tasks that cannot yet be automated. The middle is a massive swarm of agentic systems executing. The bottom is a small army of human Verifiers — top experts in each domain, equipped with tools, reviewing what the agents produce. This is not a description of current firms. It is an ideal type. The authors acknowledge that the verification layer is itself vulnerable to automation as measurability improves. The expert who verifies is also the expert training the system that will verify itself.
The policy prescriptions follow from the externalities logic: strict liability regimes and mandatory insurance for agentic outcomes, so that the tail risks of unverified deployment are priced in rather than socialized. The paper cites 11 Labs' move to insure its audio agent as an early example of this logic appearing in practice. The authors acknowledge that this is currently the exception.
What the paper is less candid about is the self-serving reading of its own crypto section. Catalini and Lazzarin spend a substantial portion of the podcast discussing how blockchain primitives — identity, provenance, trustless settlement — become essential infrastructure for verification in an agentic economy. This is not obviously wrong. On-chain transaction flows give agents more context to act on; cryptographic proofs provide a verification substrate that is cheaper to audit than institutional intermediaries. The authors have a point. But it is a point that happens to align with a decade of venture investment in crypto infrastructure, and the paper does not flag that alignment as a potential conflict.
The Augmented Economy the authors describe — verification scaled to match agentic capability, the Codifier's Curse redirected into continuous upskilling, alignment drift managed as an ongoing process rather than a one-time problem — is genuinely more compelling than the Hollow one. The mechanisms it requires, though, are not market defaults. They are institutional fixes: liability regimes, insurance markets, verification-grade data network effects that accumulate in proprietary training pipelines. The paper acknowledges this. It does not resolve it.
The apprentice is gone. The verifier remains — for now. The question is whether the infrastructure for verification scales before the cost of not having it does.