Meituan's 1.6-Trillion-Parameter AI Model Runs on Zero Nvidia Chips, and Global Developers Are Already Using It

Meituan's 1.6-Trillion-Parameter AI Model Runs on Zero Nvidia Chips, and Global Developers Are Already Using It — type0 | type0

PREVIEWMeituan's 1.6-Trillion-Parameter AI Model Runs on Zero Nvidia Chips, and Global Developers Are Already Using It · MD

When Meituan quietly published LongCat-2.0 on June 30, the official story was that a Chinese food-delivery giant had trained a 1.6-trillion-parameter language model without using a single Nvidia accelerator (announcement on X via @Meituan_LongCat). The more interesting story had already happened: for months, overseas developers had been routing real traffic to that same model through OpenRouter, a third-party platform that lists anonymous AI models for paid API access, under the codename Owl Alpha. According to QbitAI's read of OpenRouter's leaderboard, Owl Alpha ranked first by monthly call volume among three OpenRouter routing tags: Hermes, Claude Code, and a third tag, before Meituan publicly named what it was (QbitAI). Developers picked it without a press release, a marketing campaign, or any obligation to use a Chinese open-weight model with no brand attached.

That second fact quietly inverts the usual framing. The standard read on a Chinese "homegrown" AI model is that it needs Western validation before it can claim any global relevance. LongCat-2.0's path was the reverse: developers found it on a third-party model router, used it at scale, and Meituan only later attached its own brand to the result.

So what is LongCat-2.0? It is a mixture-of-experts language model with roughly 1.6 trillion total parameters, of which only about 48 billion are active on any given token (model card on longcatai.org). The mixture-of-experts, or MoE, architecture is what makes a model of that size economically runnable: you ship a trillion-plus-parameter brain but only pay the inference cost of the much smaller slice that actually fires for a given input. The same trick lets the model serve a native 1-million-token context window, supported by a custom attention mechanism called LongCat Sparse Attention that prunes which token pairs need to attend to each other (QbitAI).

The hardware story is the one most readers will have seen framed as the headline: training and inference both run on domestic Chinese accelerators, with Nvidia absent from the loop. Meituan says roughly 50,000 Chinese chips supported the full training-to-inference pipeline, though the company has not publicly named the specific accelerator vendor (SCMP; Yahoo Tech). The more honest framing is that zero Nvidia is interesting because of what it implies for inference cost and open-weight distribution, not because the chips themselves are the news. A model whose training was not shaped by export-controlled hardware can be priced differently when developers call it, and Meituan is pricing it aggressively.

How aggressive? According to QbitAI's hands-on test, the model can generate a working physics-simulation application from a single prompt for roughly 9,004 tokens, which works out to under ¥0.10 on Meituan's published token pricing (QbitAI). That figure is on Meituan's own pricing schedule and its own cache-hit assumptions, so it is a vendor number, not a head-to-head public benchmark. But it is the kind of number that explains why a developer who finds the model on OpenRouter in mid-June would keep calling it through July.

On capability, the LongCat-2.0 technical report puts the model in the same conversation as Gemini 3.1 Pro, GPT-5.5, and Claude Opus 4.6 to 4.8, with Terminal-Bench 2.1 scores around 77.6 to 77.8, SWE-bench Pro in the 81.3 to 85.3 range, and SWE-bench Multilingual between 79.3 and 84.3 (QbitAI). Those are Meituan's own numbers from the model's own report, and none of the international coverage that picked up the launch ran independent reproductions (NDTV Profit). The post-training pipeline that produced those scores uses a multi-teacher on-policy distillation setup, with separate experts for tool use, multi-hop reasoning, instruction following, hallucination suppression, and domain integration, all running on the same domestic accelerator clusters.

The hands-on demos QbitAI ran read more like a developer test plan than a marketing reel. In a long-context retrieval test over a multilingual cross-industry corpus the model pulled answers in roughly one second. In a refactor task, it took an open-source 13,000-star 2048 HTML/CSS/JS repository, generated a seven-step plan, and shipped a working reskin with a five-by-five grid, step counter, and cyberpunk color scheme in about twelve minutes (QbitAI). One reviewer, one machine, one opinion: useful as a hint of capability, not as independent verification.

Two things to watch. First, Meituan's claim to be the first trillion-parameter model trained and served end-to-end without Nvidia depends on how you count. Comparable domestic efforts, including earlier DeepSeek and Alibaba work, used mixed Nvidia-plus-domestic stacks, so the strict "first" wording is Meituan's framing rather than an independent claim (QbitAI). Second, the OpenRouter call-volume ranking under Owl Alpha is QbitAI's read of OpenRouter's public leaderboard tags, not an independent benchmark, and the exact ordering deserves a check against the live leaderboard before the language gets too confident. If both claims hold up under that scrutiny, the more durable story is the one that was already true on OpenRouter weeks before June 30: developers voting with their API calls for an open-weight Chinese model they had no particular reason to prefer.

Meituan's 1.6-Trillion-Parameter AI Model Runs on Zero Nvidia Chips, and Global Developers Are Already Using It

Sources