The AI moat is a rubric

PREVIEWThe AI moat is a rubric · MD

The honest story of recent AI progress is more mundane, and more consequential, than the headlines suggest. Models did not get dramatically better at learning from less. They got vastly more expert-labeled data, much of it produced by an industrial process called reinforcement learning, in which a model is trained against a judgment function that scores each of its attempts as good or bad. Once that engine is visible, the leverage in the AI race comes into focus.

This is the argument Dwarkesh Patel makes in a new essay on the Dwarkesh Podcast, published 2026-06-19. Patel uses an image of a galaxy of capabilities held together at the center by an "unimaginably massive black hole of data." The image is right in spirit, but it is also a little evasive. A black hole is passive. It just sits there. The thing at the center of AI progress is active: it is a verifier, a function built by humans, that decides which model attempts deserve to become training data and which do not.

To see the engine, it helps to look at the parts. Sample efficiency is one definition of intelligence, meaning how much data a model must see in a domain to operate fluently. Patel observes, fairly, that training-time sample efficiency has not meaningfully improved in recent years. The recent gains came from widening and improving the data distribution: more domains covered, more high-quality examples in each, and far more compute spent producing them. The essay's stated primary driver of recent AI progress is more and better data, plus the compute needed to develop that data.

The shape of that data production matters more than the volume. The standard pattern in modern frontier work is roughly this. A team picks a domain they want the model to improve, say contract law, numerical integration, or competitive programming. Domain specialists write step-by-step worked examples of how an expert would solve a problem, what the field calls expert trajectories. Other specialists write grading criteria, called rubrics, that describe what a good answer looks like. The model is then set to attempt the task thousands of times. Each attempt is a rollout. A verifier, built from those rubrics and reference solutions, scores the rollout. The model is updated to predict more of the good rollouts and fewer of the bad ones, much as a language model is trained to predict the next token. Patel frames reinforcement learning as a form of synthetic data generation: you are not discovering data, you are manufacturing it, by spending compute against a judgment function until it surfaces examples the model can learn from.

This is where the data black hole metaphor starts to mislead. The data is not a lake to dip into. The data is whatever your verifier certifies as correct. A generous verifier produces a generous model. A biased verifier, biased in style, in substance, or in what it counts as a passing answer, produces a model that inherits the bias. A verifier that cannot tell a right answer from a confident wrong one will produce a model that confidently answers wrongly. The moat is the rubric.

The implication shows up in how capability claims should be read. A benchmark score in a domain with a clean, widely-shared verifier, such as competitive mathematics or unit-tested code, is a defensible claim. The verifier exists, it can be reproduced, and the rubric is roughly the same across labs. A benchmark score in a domain without a verifier, or with one controlled by a single company, is a weaker claim. The capability is real inside the verifier's domain and brittle outside it. It will not transfer cleanly to a different organization, because the rubric is the organization.

That reframing has practical consequences. The first question to ask of a new AI capability demonstration is not "how big is the model" or "how much data was it trained on." It is "who built the verifier, what does it reward, and is it reproducible outside the lab that built it." If the answer is "we did, in private, and we will not show you," the demonstration is closer to an internal product benchmark than a scientific result, and it should be read that way.

The lesson of the recent AI progress story, then, is not that data is scarce and someone needs to scrape the rest of the internet. The scarce resource is judgment. The frontier is moving fastest in domains where someone has bothered to write a careful rubric for what a good answer looks like, and has the compute to run that rubric against millions of model attempts. The frontier is moving slowest where no one has done that work yet, or where the work is locked behind a single organization's wall. Patel is right that the recent gains are mostly a data story. The most important part of that story is the human judgment that defined what "better" data should look like in the first place. Whoever builds the best verifiers in the most domains will, for the foreseeable future, set the ceiling on what the models can do in those domains. That is the data black hole at the center of AI, and unlike a black hole, it is built by hand.

The AI moat is a rubric — type0 | type0

The AI moat is a rubric

Sources