DeepSeek-V4-Pro API Permanent Price Cut

On May 22, 2026, DeepSeek officially announced that the API price of its flagship model, DeepSeek-V4-Pro, will be permanently reduced to 1/4 of the original price after the limited-time 75% discount ends on May 31.
This is not a short-term promotion — it’s a genuine strategic shift in pricing.
1. How Much Does It Cost After the Price Cut?
Let’s look at the core numbers:
| Billing Item | Original Price (yuan/million tokens) | Permanent Price (yuan/million tokens) | Reduction |
|---|---|---|---|
| Input (cache hit) | 0.1 | 0.025 | 75% |
| Input (cache miss) | 12 | 3 | 75% |
| Output | 24 | 6 | 75% |
All three tiers are cut to 1/4 of the original price. The cache-hit input price has dropped to just 0.025 yuan/million tokens — practically negligible.
A cross-model comparison makes it even more clear:
| Model | Input Price | Output Price |
|---|---|---|
| DeepSeek-V4-Pro | 3 yuan | 6 yuan |
| GPT-5.5 | ~120 yuan | ~240 yuan |
| Claude Opus 4 | ~105 yuan | ~210 yuan |
DeepSeek-V4-Pro’s input and output prices are only 2%-3% of GPT-5.5 and similar competitors — not even in the same ballpark.
2. Why Can It Be This Cheap?
This level of price reduction isn’t a loss leader. It’s backed by clear technical foundations.
1. Proprietary Attention Architecture
DeepSeek has used the MLA (Multi-Head Latent Attention) architecture since V2, which drastically compresses the memory footprint of the attention mechanism. V4 further optimizes this, reducing single-inference memory usage by approximately 60% compared to models of similar scale.
2. Huawei Ascend Chip Optimization
The DeepSeek team has done deep operator-level adaptation for the Huawei Ascend 910B, maximizing communication bandwidth utilization and mixed-precision training stability. Domestic chips cost significantly less than NVIDIA A100/H100, while the actual inference efficiency gap continues to narrow.
3. Engram System: CPU as Warehouse, GPU as Workshop
V4’s Engram system stores 80% of static knowledge in CPU DRAM, leaving only core inference tasks for the GPU. This “hot-cold separation” architecture multiplies GPU memory utilization and directly reduces the hardware cost per inference.
3. What Does This Mean for Developers?
High Token Consumption Scenarios Are Finally Affordable
Code generation, long document analysis, batch data processing — these scenarios share one thing in common: massive token consumption. For a medium-scale code completion task, a single call might consume 50,000-100,000 tokens. Running it on GPT-5.5 costs a few yuan per call; on DeepSeek-V4-Pro, it costs just a few fen.
This price gap directly affects two decisions:
- Teams that previously skipped AI assistance due to cost can now reconsider
- Teams already using other APIs face near-zero migration cost (OpenAI SDK compatible — just change the endpoint)
Small Teams and Individual Developers Benefit the Most
Big tech companies have the budget to run hundred-billion-parameter models. Small teams don’t. DeepSeek-V4-Pro brings top-tier model costs down to a level where everyone can afford it, which is a substantial win for independent developers, startups, and students.
4. The 70 Billion Yuan Funding and AGI Direction
Alongside the price cut announcement, DeepSeek disclosed its ongoing 70 billion yuan funding round.
Founder Liang Wenfeng’s stance is clear: AGI technology breakthroughs take priority over short-term commercialization. This means DeepSeek won’t significantly raise prices due to funding pressure in the near term — instead, it will continue using low pricing to expand its developer ecosystem.
This logic is similar to Meta’s decision to open-source LLaMA — build the ecosystem moat first, then talk about commercialization. The difference is that DeepSeek is pursuing a dual-track approach of “ultra-low-priced API + open-source weights,” which is even more developer-friendly.
5. How to Get Started? Up and Running in One Minute
If you haven’t tried DeepSeek-V4-Pro yet, integration is straightforward:
API Method: Compatible with OpenAI SDK — just modify the base_url and api_key:
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
Online Experience: Don’t want to deal with API setup? Use it directly online:
Final Thoughts
DeepSeek-V4-Pro’s permanent price cut is fundamentally redefining the price baseline for large model APIs.
When cache-hit input costs only 0.025 yuan/million tokens and output is just 6 yuan/million tokens, many AI application scenarios that previously “didn’t make financial sense” suddenly become viable. This isn’t marketing spin — it’s a genuine cost downgrade.
The new pricing takes effect after the promotion ends on May 31. If you’re working on any project involving heavy token consumption, now is the time to start testing DeepSeek-V4-Pro.