Z.ai's GLM-5.2 tops open-weights AI benchmarks at a fraction of GPT-5.5's price, with a token-hunger catch

PREVIEWZ.ai's GLM-5.2 tops open-weights AI benchmarks at a fraction of GPT-5.5's price, with a token-hunger catch · MD

On June 16, 2026, Chinese AI lab Z.ai put a 753-billion-parameter language model on the public record under an MIT license. The release lands in the middle of a quiet but consequential question for independent developers: is there a model they can actually self-host, or route through OpenRouter, that closes most of the gap to closed frontier APIs without a closed-frontier bill? Artificial Analysis says yes, on the strength of GLM-5.2's Intelligence Index v4.1 score of 51, one point clear of MiniMax-M3 and DeepSeek V4 Pro and the top mark in the open-weights category.

The price is the second half of the pitch. Through OpenRouter, GLM-5.2 lists at $1.40 per million input tokens and $4.40 per million output tokens. Closed frontier APIs are not in that neighborhood. GPT-5.5 is $5 in and $30 out. Claude Opus 4.5-4.8 is $5 in and $25 out. On a per-token basis, GLM-5.2 is roughly 3.5x cheaper than GPT-5.5 on input and nearly 7x cheaper on output. For a developer running a retrieval-heavy chatbot, that ratio is the difference between a viable weekend project and a credit-card-anchored product. The catch, and it matters, is that GLM-5.2 is hungry. Artificial Analysis puts its output-token burn at about 43,000 tokens per Intelligence Index task, against roughly 24,000 for DeepSeek V4 Pro, 32,000 for Kimi K2.6, and 37,000 for MiniMax-M3, according to the same Artificial Analysis summary relayed by Simon Willison. On long, chatty, agentic workloads the per-token gap shrinks, and in some cases disappears.

The model itself is a sparsely activated mixture of experts. The full parameter count is 753 billion, with 40 billion active on any given token, on roughly 1.51 terabytes of weights. Context expanded from 200,000 tokens in GLM-5.1 to 1,000,000 tokens here, large enough to fit most codebases and long-document analyses in a single window. Vision is handled by a separate GLM-5V-Turbo family that is not part of the open-weights release, so GLM-5.2 is text-in, text-out only. The MIT license covers the weights themselves. It does not cover training data or training code, so calling GLM-5.2 "open source" overstates what the license grants.

There is a real competitive context here, and it is not the one a wire lede would lead with. The open-weights pack has been tightening for months. DeepSeek V4 Pro (max) sits at 44 on the same index. Kimi K2.6 sits at 43. The gap between the leader and the rest of the pack was, until this release, narrow enough that the top three were interchangeable for most tasks. GLM-5.2 stretches that gap. Simon Willison's framing, "probably the most powerful text-only open weights LLM," is a qualified claim, and Artificial Analysis's index is one benchmark suite among several, but it is the most cited independent suite in this category, and the score is the score.

There is also a coding-specific check that complicates the picture. GLM-5.2 is second on the Code Arena WebDev leaderboard, behind Claude Fable 5. If agentic coding is the workload, GLM-5.2 is not the top of the open-weights field, and it is not the top of the closed field either. It is a strong second on a leaderboard that measures something specific: shipping small web apps from prompts. The general-intelligence ranking and the coding ranking point in different directions, and the right choice depends on the workload.

What to watch next is straightforward. The first question is whether other independent evaluators reproduce the Intelligence Index gap. Artificial Analysis's index is well regarded, but a single benchmark suite is a single benchmark suite, and the previous top of the open-weights pack changed quarter to quarter. The second question is the ecosystem: how quickly the OpenRouter, Hugging Face, and self-hosting communities get a stable, fast inference path running on the kinds of hardware that small teams and academic labs actually have access to. A 753-billion-parameter model is not a one-GPU build, and the practical ceiling on GLM-5.2's reach is the practical ceiling on affordable inference. The third question is Z.ai's roadmap. GLM-5.2 is text-only in the open-weights line, and the GLM-5V-Turbo vision family is not open weights. If the next release closes that gap, the developer-economics story changes shape again.

For now, the practical read is the one Artificial Analysis and OpenRouter's price list already support. Independent developers, small teams, and academic or hobbyist users who cannot pay $5 to $30 per million tokens at closed frontier APIs have a benchmark-leading MIT-licensed model to point at, and a price that is close to 4x cheaper on input and 7x cheaper on output than the most-used closed frontier, with the honest caveat that the model drinks more tokens per task than its closest competitors. The build-or-buy math is closer to a coin flip than a verdict, but the coin is now in the open-weights hand.

Z.ai's GLM-5.2 tops open-weights AI benchmarks at a fraction of GPT-5.5's price, with a token-hunger catch — type0 | type0

Z.ai's GLM-5.2 tops open-weights AI benchmarks at a fraction of GPT-5.5's price, with a token-hunger catch

Sources