The Price War Is Over. The Real Race Just Started.
DeepSeek's V4 pricing and the deprecation of its legacy discount brand turn 'cheap' from a promo into the floor every frontier model has to clear.
DeepSeek's V4 pricing and the deprecation of its legacy discount brand turn 'cheap' from a promo into the floor every frontier model has to clear.
For two years, the question in applied AI was whether cheap inference could survive contact with frontier quality. The deal looked unstable: a Chinese lab undercut US frontier prices, the headlines ran for a week, then the usual caveats followed. Now the company has made its promotional V4 Pro discount permanent, and pinned the new numbers to its API pricing page. The promotional window is closed. The race to clear a new cost floor is on.
The headline numbers are concrete enough that a builder can drop them into a spreadsheet. V4 Flash lists at $0.14 per million input tokens on cache miss, $0.28 per million on output, and $0.0028 per million on cache-hit input. V4 Pro, the larger model, lists at $0.435 per million input, $0.87 per million output, and $0.003625 per million on cache hit, per the official pricing page. Both V4 variants offer a 1 million token context window and up to 384,000 tokens of output, with concurrency limits of 2,500 for Flash and 500 for Pro, and OpenAI- and Anthropic-compatible base URLs published for drop-in migration.
For comparison, The Next Web's reporting places OpenAI's GPT-5 at $2.50 per million input and $10 per million output, Anthropic's Claude Opus 4.7 at $5 and $25, and Google's Gemini 3.5 Flash at $0.15 and $0.60. That gap is not a temporary lead; it is the standing floor DeepSeek's own product page now publishes.
The structural signal is the deprecation schedule. DeepSeek's API documentation lists the legacy deepseek-chat and deepseek-reasoner model IDs for retirement, with the discount brand set to close on 2026-07-24. The thinking and non-thinking modes those IDs covered are being folded into V4 Flash and V4 Pro. The company is not running "cheap" as a parallel promo tier. It is consolidating the discount into the versioned product line. That is what "permanent" means in practice: a single pricing chart, a single set of model IDs, and a single deprecation clock.
The change matters because the cost floor is no longer the lever. The next competitive axis is quality, context length, and tool-call reliability. V4 ships 1 million tokens of context, large enough to fit most codebases, multi-document legal review packets, and long video transcripts in a single request. The 384,000-token output ceiling is unusual; most frontier APIs cap output at lower numbers. Concurrency at 2,500 for Flash is practical for batch workloads, not just demos. The OpenAI- and Anthropic-compatible base URLs let teams migrate with config changes rather than rewrites.
None of that closes the quality gap to frontier US labs on hard reasoning or code. The V4 Pro pricing is not a guarantee that V4 Pro matches GPT-5 on the tasks a buyer cares about, and the pricing sustainability is a real open question. DeepSeek's own docs reserve the right to adjust prices, so the floor is durable, not contractual. There is also the geopolitical and export-control exposure of routing a meaningful share of inference through a Chinese lab, which lands differently on a US federal contractor, a European bank under the AI Act, and a startup in Singapore. The criticism stays.
What changes for a builder with a real workload is the cost model. A retrieval-augmented generation pipeline that runs $200 a day on GPT-5 would run roughly $30 a day on V4 Pro at the same input and output volume — a comparison that illustrates the cost difference rather than stating a precise figure for any specific workload, and one that improves further before cache-hit discounts. Cache-hit input at $0.003625 per million tokens reshapes the math for repeated context: a system prompt, a tool schema, or a long document the model is asked about many times. The 1M context window means fewer retrieval tricks and longer direct prompts. The 384K output ceiling means a single request can produce a long document, a full code module, or a multi-step plan without a continuation loop.
Two things to watch. First, whether US frontier labs respond with their own permanent price reductions or with credit, batch, and cache-hit programs that close the effective gap. GPT-5 is not at $2.50 forever if the new floor becomes $0.14 on input. Second, whether DeepSeek's quality keeps pace with the price. The V4 Pro pricing is the headline; the next quarter's third-party benchmarks are the test. If V4 Pro holds quality near frontier on hard reasoning, the cost story is structural. If it slips, the floor stays, and the buyer treats V4 as a tier rather than a substitute.
The price war is over because the price war stopped being a war. There is a published floor, a versioned product line, and a deprecation schedule. What is just starting is the harder race: quality at the new cost, context length as a product feature, and tool reliability as a real differentiator. Builders who treat cheap as a temporary edge case are pricing their roadmap on a market that no longer exists.