DeepSeek shipped V4 today. The weights are live on HuggingFace under an MIT license. The technical report is public. The benchmark tables are exactly what the rumors said they would be. What four days of speculation couldn't tell you: the model's commercial ceiling is not what the benchmarks show. According to SCMP, V4's availability is limited by Huawei Ascend 950PR chip supply, and prices won't drop until those super nodes ship at scale in the second half of 2026. That sentence has never appeared in a DeepSeek release before.
The flagship, DeepSeek-V4-Pro, runs 1.6 trillion total parameters with 49 billion activated at inference time, a mixture-of-experts design where most of the model stays dormant for any given query. The companion, DeepSeek-V4-Flash, runs 284 billion total parameters with 13 billion activated. Both support a context window of one million tokens (roughly 750,000 words, or about ten novels), and both were pretrained on more than 32 trillion tokens. V4-Pro is released as a preview version, not a general availability launch; the weights are downloadable but DeepSeek has not declared it production-ready.
The benchmark that will get the most attention is coding. V4-Pro Max scores 93.5 on LiveCodeBench, a benchmark that tests code generation on problems from competitive programming contests, compared to 91.7 for Gemini-3.1-Pro and 88.8 for Claude Opus-4.6. On Codeforces, a competitive programming rating system, V4-Pro Max scores 3,206 versus 3,168 for GPT-5.4. These are self-reported numbers from DeepSeek's own technical documentation, and no independent lab has replicated them yet. On SWE-Verified, which measures real-world software engineering task completion, V4-Pro Max scores 80.6, matching Gemini-3.1-Pro at 80.6 and just below Opus-4.6 at 80.8. That's a tie, not a win.
The efficiency story is more striking than the raw benchmarks. At a one-million-token context, V4-Pro requires only 27 percent of the single-token inference compute and 10 percent of the KV cache that DeepSeek's previous model, V3.2, needed to handle the same context length. KV cache is the memory a model uses to track what it's read so far in a conversation; shrinking it by 90 percent at million-token scale matters for anyone trying to run this commercially.
The hardware question is where the story gets complicated in ways the benchmark tables can't resolve. As type0 reported four days ago citing Reuters, DeepSeek's engineers spent months rewriting core code to run on Huawei's CANN computing framework instead of Nvidia's CUDA. V4's technical documentation mentions kernels adapted to both Nvidia and Huawei hardware. What it doesn't say is what DeepSeek actually trained V4 on. U.S. officials have accused DeepSeek of using Nvidia Blackwell chips, which are banned from export to China; DeepSeek has not addressed that accusation directly. The company's previous model, V3, was trained on 2,048 Nvidia H800 graphics processing units, according to SCMP, chips that were on the U.S. export control list at the time.
The training hardware gap matters because it sets the ceiling for what DeepSeek can build next. The inference hardware gap matters because it determines who can run this and at what price. Reuters reported in early April that Alibaba, ByteDance, and Tencent had placed bulk orders totaling hundreds of thousands of Huawei chips ahead of the V4 launch. If those orders translate into production capacity in H2 as DeepSeek says, Chinese enterprise AI inference moves onto domestic silicon at scale.
Nvidia chief executive Jensen Huang addressed this directly last week on the Dwarkesh Podcast. "If future AI models are optimised in a very different way than the American tech stack, and as AI diffuses out into the rest of the world with Chinese standards and technology, China will become superior to the US," Huang said. That was a warning about the direction of travel, not a description of where things stand today.
One technical detail developers will hit immediately: V4 uses a new chat template format with no Jinja template included. Developers integrating V4 into existing pipelines will need custom encoding logic. DeepSeek has not yet posted a Jinja-compatible template, which means anyone running V4 against standard inference libraries is working around the spec, not with it.
The model's release label is "preview." The weights are real, the technical report is public. What the preview label means in practice: DeepSeek is not guaranteeing V4 behaves the way the benchmark tables suggest it will behave in your specific use case. The independent replication work hasn't happened yet. The API pricing hasn't been officially confirmed on DeepSeek's documentation page. The hardware disclosure gap hasn't been resolved. All of that is work for the next few weeks.
What has changed as of today: the weights exist, the supply constraint is acknowledged in writing for the first time. DeepSeek didn't say their model was limited by what they could build. They said it was limited by how many Huawei chips exist. That's a different sentence entirely.