DeepSeek-V4-Pro API Permanent Price Cut

DeepSeek-V4-Pro API Permanent Price Cut

On May 22, 2026, DeepSeek officially announced that the API price of its flagship model, DeepSeek-V4-Pro, will be permanently reduced to 1/4 of the original price after the limited-time 75% discount ends on May 31.

This is not a short-term promotion — it’s a genuine strategic shift in pricing.

Start Using DeepSeek

1. How Much Does It Cost After the Price Cut?

Let’s look at the core numbers:

Billing ItemOriginal Price (yuan/million tokens)Permanent Price (yuan/million tokens)Reduction
Input (cache hit)0.10.02575%
Input (cache miss)12375%
Output24675%

All three tiers are cut to 1/4 of the original price. The cache-hit input price has dropped to just 0.025 yuan/million tokens — practically negligible.

A cross-model comparison makes it even more clear:

ModelInput PriceOutput Price
DeepSeek-V4-Pro3 yuan6 yuan
GPT-5.5~120 yuan~240 yuan
Claude Opus 4~105 yuan~210 yuan

DeepSeek-V4-Pro’s input and output prices are only 2%-3% of GPT-5.5 and similar competitors — not even in the same ballpark.

2. Why Can It Be This Cheap?

This level of price reduction isn’t a loss leader. It’s backed by clear technical foundations.

1. Proprietary Attention Architecture

DeepSeek has used the MLA (Multi-Head Latent Attention) architecture since V2, which drastically compresses the memory footprint of the attention mechanism. V4 further optimizes this, reducing single-inference memory usage by approximately 60% compared to models of similar scale.

2. Huawei Ascend Chip Optimization

The DeepSeek team has done deep operator-level adaptation for the Huawei Ascend 910B, maximizing communication bandwidth utilization and mixed-precision training stability. Domestic chips cost significantly less than NVIDIA A100/H100, while the actual inference efficiency gap continues to narrow.

3. Engram System: CPU as Warehouse, GPU as Workshop

V4’s Engram system stores 80% of static knowledge in CPU DRAM, leaving only core inference tasks for the GPU. This “hot-cold separation” architecture multiplies GPU memory utilization and directly reduces the hardware cost per inference.

3. What Does This Mean for Developers?

High Token Consumption Scenarios Are Finally Affordable

Code generation, long document analysis, batch data processing — these scenarios share one thing in common: massive token consumption. For a medium-scale code completion task, a single call might consume 50,000-100,000 tokens. Running it on GPT-5.5 costs a few yuan per call; on DeepSeek-V4-Pro, it costs just a few fen.

This price gap directly affects two decisions:

  • Teams that previously skipped AI assistance due to cost can now reconsider
  • Teams already using other APIs face near-zero migration cost (OpenAI SDK compatible — just change the endpoint)

Small Teams and Individual Developers Benefit the Most

Big tech companies have the budget to run hundred-billion-parameter models. Small teams don’t. DeepSeek-V4-Pro brings top-tier model costs down to a level where everyone can afford it, which is a substantial win for independent developers, startups, and students.

4. The 70 Billion Yuan Funding and AGI Direction

Alongside the price cut announcement, DeepSeek disclosed its ongoing 70 billion yuan funding round.

Founder Liang Wenfeng’s stance is clear: AGI technology breakthroughs take priority over short-term commercialization. This means DeepSeek won’t significantly raise prices due to funding pressure in the near term — instead, it will continue using low pricing to expand its developer ecosystem.

This logic is similar to Meta’s decision to open-source LLaMA — build the ecosystem moat first, then talk about commercialization. The difference is that DeepSeek is pursuing a dual-track approach of “ultra-low-priced API + open-source weights,” which is even more developer-friendly.

5. How to Get Started? Up and Running in One Minute

If you haven’t tried DeepSeek-V4-Pro yet, integration is straightforward:

API Method: Compatible with OpenAI SDK — just modify the base_url and api_key:

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Online Experience: Don’t want to deal with API setup? Use it directly online:

Start Using DeepSeek

Final Thoughts

DeepSeek-V4-Pro’s permanent price cut is fundamentally redefining the price baseline for large model APIs.

When cache-hit input costs only 0.025 yuan/million tokens and output is just 6 yuan/million tokens, many AI application scenarios that previously “didn’t make financial sense” suddenly become viable. This isn’t marketing spin — it’s a genuine cost downgrade.

The new pricing takes effect after the promotion ends on May 31. If you’re working on any project involving heavy token consumption, now is the time to start testing DeepSeek-V4-Pro.

Start Using DeepSeek