Arena, the crowdsourced leaderboard where users pick the better of two anonymous AI model outputs, has hit roughly $100 million in annualized revenue about eight months after launching a paid analytics product aimed at AI labs and enterprises. The milestone is less interesting than what it actually measures. TechCrunch reports that CEO Anastasios Angelopoulos, a co-founder, wants the press to stop using the industry's favorite shorthand for the figure. "ARR," annualized recurring revenue, is the wrong word, he says, because Arena does not sell subscriptions. It sells consumption. A lab pays for the evaluations it actually runs, not for a seat it might not use.
A consumption-based business can hit $100M in run-rate revenue and still tell you nothing about next quarter, which is why Arena's number deserves both attention and skepticism. But it also tells you something stronger than a subscription figure would: customers are actively using the product, not parking an unused license on a budget line. In an industry where most AI tooling is sold on optimistic seat-count forecasts, a consumption chart that climbed from $30M at the January Series A to $100M by June looks more like cloud infrastructure than software-as-a-service.
The product doing the climbing is AI Evaluations, a paid analytics service Arena launched in September. The free offering is the public leaderboard: millions of users anonymously voting on which of two model responses is better, building a dataset that has become the de facto community ranking for new AI systems. Labs get early access to unreleased models in exchange for letting their outputs be ranked. AI Evaluations sells the same kind of head-to-head judgments, plus deeper analytics, to model developers and enterprise buyers who cannot generate that volume of independent evaluation in-house.
The proof that the community flywheel matters sits in the failed competitor. Yupp, the only other consumer-facing leaderboard operating at comparable scale, shut down in March after attempting to move enterprise-first without keeping a free community layer alive. Arena's pitch, in effect, is that the community evaluations are the moat: anyone can build a leaderboard interface, almost no one can build ten million ranked comparisons that labs actually trust.
Arena now competes "for the same dollar," in Angelopoulos's words, with the human-labeling post-training vendors that have themselves been on a tear. Mercor's annualized revenue topped $1B earlier this year, up from $500M in September, according to The Information, as reported by TechCrunch. Handshake's training gross annualized revenue has nearly doubled since January to roughly $1B, also per The Information via TechCrunch. Scale AI and Surge round out a post-training market that has spent the last year turning human feedback on AI outputs into a high-margin services business. Arena sits one layer over: instead of paying humans to grade or fine-tune models directly, it sells the aggregated judgment data and analytics those labs would otherwise have to build themselves.
The risk notes are visible in the same source. Arena is the only company publicly claiming these numbers, with no independent confirmation beyond Angelopoulos's interview with TechCrunch and the January Series A filing. Handshake and Mercor figures come from The Information reporting cited secondhand, not direct filings, and should be read as comps, not as audited market data. The $100M figure is company-stated run-rate, not audited revenue. For any claim about safety, market impact, or adoption trends beyond Arena's own statements, independent reporting is the next step.
What to watch next: whether Arena's growth holds as more labs build in-house evaluation tools, whether the consumption curve flattens at the same place Mercor and Surge appear to have hit, and whether the next generation of leaderboard competitors learns from Yupp's mistake and keeps a free community product alive while chasing enterprise dollars.