Xiaomi Open-Sourced a Free AI Model for Long Tasks. Independent Testing Found a Gap

Xiaomi Open-Sourced a Free AI Model for Long Tasks. Independent Testing Found a Gap — type0 | type0

Xiaomi open-sourced a model last week that developers can actually ship. MiMo-V2.5-Pro, released April 27 under a permissive MIT license, is the rare open-weight model that removes the friction points other releases leave in place — no per-seat fees, no enterprise agreement, no weeks of legal review before a startup can embed it in a product. Whether it works as well in the wild as Xiaomi says it does is a separate question the company's own benchmarks cannot answer.

MiMo-V2.5-Pro is a mixture-of-experts model: a 1.02-trillion-parameter architecture that activates only the roughly 42 billion parameters relevant to each task, letting it handle very long inputs without the cost spike full-parameter models produce. It sustains more than 1,000 tool calls in long-horizon agentic tasks, according to Xiaomi, and is priced at $1 per million input tokens and $3 per million output tokens on its public API. The weights are on Hugging Face.

The MIT license is the part developers noticed first. Unlike models released under commercial licenses that restrict fine-tuning or impose revenue-sharing, MiMo-V2.5-Pro can be incorporated into products, fine-tuned on proprietary data, and deployed commercially without per-seat fees or enterprise agreements. A startup building a customer service agent, a tooling company wrapping model capabilities into a developer product, or an enterprise integrating autonomous workflow automation into internal systems can now do so without a licensing conversation.

The harder question is what the model actually does outside Xiaomi's own evaluation harness. The demonstrations supporting its sustained tool-call claims were extreme by design. In one, the model built a complete SysY-to-Rust compiler from scratch and scored 233 out of 233 on the course hidden test suite using 672 tool calls over 4.3 hours. In another, it produced an 8,192-line video editor over 1,868 tool calls in 11.5 hours of autonomous work. Those demos establish what's theoretically possible. Whether the model maintains coherence over that many steps outside Xiaomi's own harness is the question independent testing hasn't fully answered.

Third-party benchmarks offer partial reassurance. AkitaOnRails, an independent developer who ran hands-on tests, placed MiMo V2.5 Pro at Tier B with a score of 67 out of 100 — notably below Xiaomi's own benchmark suite, a gap that is itself the verification problem worth tracking across model-release stories. MiMo-V2.5-Pro's base model scores 35.7 on SWE-Bench AgentLess, a software engineering benchmark requiring sustained multi-step tool use; Kimi K2 Base scores 28.2 on the same test. Independent benchmarking service Artificial Analysis ranks MiMo-V2.5-Pro at Intelligence Index 54, tied for second among all open-weight models with Kimi K2.6 — third-party confirmation that Xiaomi is not simply cherry-picking from its own evaluation suite.

The architecture behind the efficiency numbers is worth understanding for teams evaluating the model seriously. MiMo-V2.5-Pro uses a hybrid attention mechanism that interleaves sliding window attention and global attention at a 6-to-1 ratio, with a 128-token sliding window. This cuts the memory required to store attention states by nearly seven times compared to standard full attention. Three multi-token prediction modules during inference triple output speed during rollout, Xiaomi says. Training used 27 trillion tokens with FP8 mixed precision. On GraphWalks, a long-context reasoning benchmark, V2.5-Pro scores 0.56 on breadth-first search and 0.92 on parents-following at 512k context, dropping to 0.37 and 0.62 at one million tokens. V2 Pro collapses to zero at one million on both subtasks.

The pricing math is where the economics get concrete. At roughly 70,000 tokens per trajectory — Xiaomi's own estimate — and $4 per million tokens combined, a 1,000-tool-call workflow on MiMo costs about 28 cents. A comparable workflow on GPT-5.4, priced at $17.50 per million tokens combined, costs roughly $1.23 at the same trajectory length. At scale the difference compounds: a team running 10,000 agentic tasks per day would pay around $375 on MiMo versus $1,230 on a comparable frontier model.

Xiaomi's distribution reach is the variable that sets this apart from a typical open-source model release. The company's HyperOS ecosystem spans more than 823 million connectable devices, from smartphones to smart home hardware. Xiaomi has not announced plans to embed MiMo inference into device firmware, but the combination of a local-capable model, an open license, and that device footprint creates a deployment surface that no other open-weight model currently has. An agent that runs partly on-device to reduce per-call latency and partly on Xiaomi's API for complex reasoning is a plausible architecture the MIT license now permits anyone to build.

Whether Xiaomi intends to monetize through API subscriptions, device-level inference, or something else is unclear. The company has invested 200 billion yuan, roughly $29 billion, into foundational R&D for chips and operating systems, which suggests the model is strategic infrastructure rather than a standalone product.

The caveats here are real. All of the sustained tool-call claims come from Xiaomi's own evaluation harness. The 40 to 60 percent token efficiency advantage is Xiaomi's measurement, not a third-party audit run against Claude Opus 4.6 or GPT-5.4 in identical conditions. AkitaOnRails noted that real-world agentic task performance varies considerably depending on tool support patterns and task structure, and its independent benchmark placed MiMo notably below Xiaomi's own results. What is true regardless is that open-weight models now compete seriously on the specific task frontier providers have used to justify premium pricing: sustained, multi-step autonomous tool use at scale. Whether MiMo's specific numbers hold up under wider independent testing is the next question worth answering.

Xiaomi Open-Sourced a Free AI Model for Long Tasks. Independent Testing Found a Gap

Sources