Moonshot AI, a Beijing-based lab, released an open-weights coding model on Friday called Kimi K2.7 Code, and the headline number is a 30 percent drop in "reasoning tokens," the extra compute a model spends thinking through a problem before producing an answer, compared with its predecessor K2.6. The figure is Moonshot's own claim on the model's Hugging Face card, not an independent measurement, which is why the more useful question is what would have to be true for that number to matter in real software work.
K2.7 Code is built for what the field has started calling long-horizon agentic coding: multi-step software work that spans many tool calls and files, not just single prompt-and-response snippets. The Hugging Face model card lists 1 trillion total parameters with 32 billion activated per token, a 256,000-token context window, and support for text, image, and video input. The lab's Product Hunt launch page packages the same facts for a developer audience: open weights, multimodal input, and a recommended agent framework called Kimi Code CLI. The architecture is a sparse mixture-of-experts setup of 384 experts, eight selected plus one shared per token, 61 layers, and a MoonViT 400-million-parameter vision encoder. Moonshot's org page labels the same model at about 1.1 trillion total parameters; the model card's 1T figure is the more specific number.
The efficiency story sits on top of native INT4 quantization, the same method used in the earlier Kimi-K2-Thinking release, and the lab points to vLLM, SGLang, and KTransformers as supported deployment paths. Weights and code ship under a Modified MIT License, which keeps the model open for commercial use. None of that resolves the question of whether reasoning tokens actually fell by 30 percent in practice, and the model card does not publish a token-by-token methodology.
The pricing tells a separate story. On Moonshot's platform.kimi.ai pricing page for K2.7 Code, input tokens cost $0.19 per million on cache hit and $0.95 per million on cache miss, while output tokens run $4.00 per million. The 262,144-token context window, auto context caching, and ToolCalls, JSON Mode, and Partial Mode support are spelled out on the same chat pricing index. For a team running long sessions against a large codebase, cache-hit pricing is the number that matters, and $0.19 per million input tokens is competitive with Anthropic's Claude and OpenAI's GPT-5.5 for the same workload, within shouting distance rather than undercut.
The benchmark table on the model card shows K2.7 Code climbing over K2.6 on a mix of in-house and third-party tests. The third-party rows are the ones to read carefully: Program Bench moves from 48.3 to 53.6, MLS Bench Lite from 26.7 to 35.1, MCP Atlas from 69.4 to 76.0, and MCPMark-Verified from 72.8 to 81.1. The in-house rows, Kimi Code Bench v2 (50.9 to 62.0) and Kimi Claw 24/7 Bench (42.9 to 46.9), were designed by Moonshot and are not externally audited. The same card places K2.7 Code against GPT-5.5 measured under Codex xhigh and Claude Opus 4.8 measured under Claude Code xhigh, but those comparison rows are also Moonshot's measurements, run on each vendor's preferred harness, so the cross-model line is closer to marketing than to a neutral leaderboard.
What K2.7 Code does not yet have is the third voice every open-weights coding model eventually needs: independent reproduction of the reasoning-token claim, a named team that has run it on a real codebase, and a critical read that situates Moonshot's release cadence against Claude, DeepSeek's V3 family, Qwen's coding models, and Zhipu's GLM. As of the Hugging Face org page listing, the model had been downloaded about 1,690 times the day after launch, a small but typical number for an open-weights release that has not yet been picked up by a major downstream project. The watch items, in order: an independent reasoning-token measurement, a real-world developer write-up that goes past the leaderboard, and a third-party benchmark that includes K2.7 Code on equal footing with its open-weights peers.