DeepSeek V4 Version History: V3 → V3-0324 → V4 Timeline (2026)

Hey fellow AI infrastructure trackers — if you've been using model='deepseek-chat' in production and suddenly realized you have no idea which model you're actually calling anymore, you're not alone.

I'm Hanks. I test AI systems in real workflows, and the DeepSeek versioning story is one of the more confusing upgrade trails in the current ecosystem — mostly because DeepSeek keeps silently swapping models under the same API endpoint without making a lot of noise about it.

My core question going into this: what actually changed between each version, what broke, and what does a developer need to know before V4 lands on the endpoint they're already using?

Here's the full timeline, with verified data labeled at every step.


Full Version Timeline

Label key:

  • Verified — from official DeepSeek API changelog, Hugging Face model card, or technical report
  • ⚠️ Claimed — internal DeepSeek testing or secondary reporting only
  • 🔲 Unknown / Not disclosed
Version
Release Date
Total Params
Active Params
Context
API Endpoint
Status
DeepSeek-V2.5
Dec 2024
~236B
~21B
128K
deepseek-chat (deprecated)
✅ Deprecated
DeepSeek-V3
Dec 26, 2024
671B
37B
64K→128K
deepseek-chat
✅ Deprecated Apr 11, 2025
DeepSeek-V3-0324
Mar 24, 2025
685B
~37B
128K
deepseek-chat
✅ Current (GitHub Models deprecated)
DeepSeek-R1
Jan 20, 2025
671B
37B
164K
deepseek-reasoner
✅ Live
DeepSeek-V3.1
Aug 2025
671B
37B
128K
deepseek-chat
✅ Live
DeepSeek-V3.2
Dec 2025
671B
37B
128K
deepseek-chat
✅ Live (current default)
DeepSeek-V3.2-Speciale
Dec 2025
671B
37B
128K
Temporary endpoint
✅ Expired Dec 15, 2025
DeepSeek-V4
TBD 2026
~1T (⚠️)
~32B (⚠️)
1M (⚠️)
TBD
🔲 Pre-release

The key thing most devs miss: deepseek-chat has been silently re-pointed at least four times since December 2024. If you locked model='deepseek-chat' in January 2025, you've been running three different models without changing a line of code. That's a feature for most use cases — and a footgun if you're running evals or trying to reproduce outputs.


V3 → V3-0324: What Actually Changed

DeepSeek called the March 24, 2025 release a "minor upgrade." Testing at Zilliz found that V3-0324 wasn't just an incremental improvement — it represented a quantum leap in performance, particularly in logic reasoning, programming, and mathematical problem-solving. DeepSeek's own framing undersold it.

Performance Deltas

These are all verified from the official DeepSeek API changelog and the V3-0324 Hugging Face model card:

Benchmark
V3
V3-0324
Delta
Status
MMLU-Pro
75.9
81.2
5.3
GPQA Diamond
59.1
68.4
9.3
AIME 2025
39.6
59.4
19.8
LiveCodeBench
39.2
49.2
10
Aider Polyglot
~40%
55%
0.15

V3-0324 uses the same base model as V3, but the post-training pipeline was improved by drawing lessons from the reinforcement learning technique used in DeepSeek-R1. This is the correct mental model: same base weights, significantly better reasoning from improved post-training.

Parameter count increased from 671 billion to 685 billion, and the context window was upgraded to 128K tokens.

Two additional improvements worth flagging:

Front-end code generation: V3-0324 produces markedly better HTML, CSS, and game UI code compared to V3. Community testing showed it could generate 700 lines of error-free code in a single pass — a practical improvement for developers using it in agentic pipelines.

Function calling accuracy: Tau-bench function call performance: 53.5 (Airline) / 63.9 (Retail). V3 had no published Tau-bench baseline, making direct delta comparison impossible — but the absolute scores are meaningful for developers building tool-use workflows.

API Compatibility

This is the most important section for developers who moved fast in early 2025.

What didn't change:

  • API endpoint: model='deepseek-chat' still works, zero changes required
  • Pricing: same rate at launch ($0.27/1M input, $1.10/1M output for cache misses)
  • Function calling syntax: fully backward compatible
  • JSON output mode: unchanged

What did change (silently):

The V3-0324 Hugging Face model card documents a temperature mapping mechanism: the API temperature of 1.0 maps to model temperature 0.3 internally. If you call V3-0324 via API at temperature 1.0, you're actually running at model temperature 0.3.

In practice: T_model = T_api × 0.3 for 0 ≤ T_api ≤ 1.

If you were running V3 at temperature 1.0 and expecting high-variance outputs, you should know that V3-0324 introduced this mapping — and your outputs became more deterministic without any code change on your end.

Deprecation timeline:

  • V3 deprecated in GitHub Models: April 11, 2025 ✅
  • V3 removed from GitHub Models: April 30, 2025 ✅
  • V3 still accessible via DeepSeek's own API through deepseek-chat briefly, then silently replaced by V3.1 in August 2025

V3-0324 → V4: The Major Upgrade

This is where the architectural story changes fundamentally. V3-0324 was a post-training improvement on the same base. V3.1 breaks this pattern entirely — it introduces fundamental changes that reshape how we think about hybrid reasoning models and hardware compatibility. V4 goes further still.

New Architecture Features

The three peer-reviewed innovations that define V4 (all pre-release, verified at paper level):

1. Manifold-Constrained Hyper-Connections (mHC) — published January 1, 2026

Training instability at trillion-parameter scale was a hard ceiling for V3's architecture. Traditional hyperconnections develop broken identity mapping and catastrophic signal amplification as network depth increases. mHC solves this by projecting connection matrices onto a manifold using the Sinkhorn-Knopp algorithm, maintaining stable gradient flow. Result: 4× wider residual stream adds only 6.7% training time overhead — the mechanism that makes 1T parameters trainable on V3-era hardware.

2. Engram Conditional Memory — published January 13, 2026 (arXiv:2601.07372)

V3's standard KV-cache degrades under very long contexts — attention becomes diffuse, and cross-file reasoning loses coherence past certain lengths. Engram replaces static knowledge retrieval with O(1) hash-based lookups, offloading syntax and library pattern recall to host DRAM while keeping active reasoning on GPU VRAM. Needle-in-Haystack result at 1M tokens: 97% accuracy vs 84.2% for standard attention architectures.

3. Dynamic Sparse Attention (DSA)

V3 used Multi-Head Latent Attention (MLA) — excellent at compressing KV-cache memory but not designed for million-token contexts. DSA adds intelligent sparsity patterns that focus compute on the most relevant context regions, enabling 1M token windows at roughly 50% lower attention compute cost.

The combined effect: V4 is not a scaled-up V3. It's a V3 base with three distinct architectural layers addressing training stability, memory efficiency, and context scale — the specific bottlenecks that make V3.2 cap out at 128K and struggle with full-repository code tasks.

V3.1/V3.2 as stepping stones

Before V4, DeepSeek shipped two intermediate models that brought specific capabilities:

DeepSeek-V3.1, released in August 2025, combines the strengths of V3 and R1 into a single hybrid model. It features hybrid thinking mode — the model can switch between chain-of-thought reasoning and direct answers just by changing the chat template. Extended training included 630B tokens for the 32K extension phase and 209B tokens for the 128K phase.

V3.2 introduced a new massive agent training data synthesis method covering 1,800+ environments and 85,000+ complex instructions. It was DeepSeek's first model to integrate thinking directly into tool-use, supporting tool calls in both thinking and non-thinking modes.

Understanding V3.1 and V3.2 matters for V4 planning: the hybrid thinking mode and tool-use-with-reasoning introduced in these versions are confirmed to carry forward into V4's architecture.

Breaking Changes

This section is where you pay close attention before V4 lands on the endpoint.

Confirmed breaking changes (based on V3.1/V3.2 precedent + architectural papers):

Change
V3/V3-0324 Behavior
V4 Expected Behavior
Impact
Context window
128K max
1M tokens (⚠️ claimed)
Prompt truncation logic needs review
Thinking mode
Separate deepseek-reasoner endpoint
Unified thinking/non-thinking via chat template
Workflow changes if you use both endpoints
Temperature mapping
T_api × 0.3 = T_model
TBD — V3.1 extended this mapping
Re-evaluate if you rely on specific temperature behavior
Token consumption
Standard
⚠️ Higher for complex reasoning
Cost estimates need recalculation
Tool-use in thinking mode
Not supported (V3)
Supported (V3.2+ pattern)
New capability, not a break

Highest-risk breaking change: token consumption. Complex reasoning tasks in V3.1 may consume more tokens compared to the legacy R1 version. If you're running budget-capped workflows, this is the variable most likely to blow your estimates when V4 lands.

Endpoint behavior: DeepSeek's pattern is to transparently migrate deepseek-chat to new model versions. V4 will almost certainly appear under deepseek-chat after launch, with advance notice in the API changelog. Subscribe to that changelog now if you're not already.


How to Check Your Current Version

DeepSeek doesn't expose a model version field in the standard chat completion response. Here are the reliable methods:

Method 1: Direct API query (fastest)

import openai
client = openai.OpenAI(
    api_key="your-deepseek-api-key",
    base_url="https://api.deepseek.com"
)
# Query what model is currently served under deepseek-chat
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "What is your model version?"}],
    max_tokens=50
)
print(response.model)          # returns model identifier
print(response.choices[0].message.content)  # model's self-report

Note: the response.model field will reflect the endpoint alias, not always the underlying version. Cross-reference with the official changelog to confirm current assignment.

Method 2: Benchmark fingerprinting

Run a known benchmark question with a known answer delta between versions. AIME 2025 Problem 1 works well — V3 scores ~39.6%, V3-0324 scores ~59.4%, V3.2 scores ~89.3%. The response quality on this single question gives you a reliable version bracket.

Method 3: Check the changelog directly

The most reliable source is api-docs.deepseek.com/updates — DeepSeek documents every endpoint migration there. Bookmark it. Check it before major production deployments.


Migration Checklist V3 → V4

Use this before V4 goes live on the endpoint. Each item is actionable today against V3.2 — the same checks will apply when V4 ships.

Phase 1: Audit current usage

  • [ ] Identify all places in your codebase where model='deepseek-chat' or model='deepseek-reasoner' is hardcoded
  • [ ] Log current average token consumption per request type (baseline for cost comparison)
  • [ ] Document any temperature-sensitive workflows — V4 may change the internal temperature mapping
  • [ ] List all tool-use / function calling implementations — verify against V3.2 pattern compatibility

Phase 2: Test against V3.2 first

V3.2 is live now and represents the closest available preview of V4's API surface. Run your test suite against it before V4 drops.

# Switch to V3.2 for pre-V4 compatibility testing
# V3.2 thinking mode
response = client.chat.completions.create(
    model="deepseek-chat",  # currently points to V3.2
    messages=[
        {
            "role": "user",
            "content": "Your test prompt here"
        }
    ],
    max_tokens=4096
)

# V3.1/V3.2 thinking mode toggle via chat template
# Enable thinking:
tokenizer.apply_chat_template(messages, tokenize=False, thinking=True, add_generation_prompt=True)

# Disable thinking:
tokenizer.apply_chat_template(messages, tokenize=False, thinking=False, add_generation_prompt=True)
  • [ ] Run regression suite against V3.2 — document any output format differences
  • [ ] Test thinking mode toggle if you use hybrid reasoning workflows
  • [ ] Verify function calling in both thinking and non-thinking modes (V3.2 pattern)
  • [ ] Confirm context window behavior — does your longest prompt still fit at 128K?

Phase 3: Prepare for V4 specifics

  • [ ] Update context handling logic to support up to 1M tokens — don't assume current truncation thresholds
  • [ ] Revise cost estimates: budget 20–30% higher token consumption for complex reasoning tasks
  • [ ] Set up API changelog monitoring — subscribe to api-docs.deepseek.com/updates for advance notice of endpoint migration
  • [ ] If running local models: verify VRAM capacity for quantized V4 (~336–400GB at Q4, or dual RTX 4090 / single RTX 5090)
  • [ ] Plan a canary deployment: route 5–10% of traffic to V4 endpoint first, monitor before full migration

Phase 4: Post-migration validation

  • [ ] Re-run baseline benchmark fingerprint after migration confirms
  • [ ] Compare token consumption against pre-V4 baseline
  • [ ] Validate tool-call accuracy on your specific function schemas
  • [ ] Check Needle-in-Haystack accuracy if you're using long-context workflows


The Practical Reality of DeepSeek Versioning

Here's what three months of tracking this taught me: DeepSeek ships fast, labels conservatively, and the API changelog is the only reliable source of truth. "Minor upgrade" on March 24, 2025 meant +19.8 AIME points. "Upgraded to V3.1" in August 2025 meant a completely new hybrid architecture.

V4's framing will be similar — probably understated in the official announcement, significantly more capable than the headline suggests, and live on deepseek-chat within days of the official release.

The checklist above is the thing that separates the teams that adapt in hours from the ones that spend a week debugging unexpected behavior. Run Phase 1 and Phase 2 now, while V3.2 is the target — you'll be ready when V4 drops.

At Macaron, we help you structure AI decisions into executable workflows — so version changes and model migrations don't derail tasks mid-run. Try it free at macaron.im and judge the results yourself.


FAQ

Q: Will V4 break my existing deepseek-chat calls? Based on DeepSeek's consistent pattern across every version transition since V2.5, the API surface will remain backward compatible. The endpoint alias stays; the underlying model swaps. The risks are behavioral — token consumption, temperature mapping, output format — not syntactic. Run the Phase 2 checklist above against V3.2 now to surface any behavioral breaks before V4 ships.

Q: Is DeepSeek-V3-0324 still available? Via DeepSeek's own API, deepseek-chat now points to V3.2 (as of December 2025). V3-0324 was deprecated in GitHub Models in April 2025 — developers were advised to transition to take advantage of enhanced features. V3-0324 weights remain available on Hugging Face for local use under MIT license.

Q: What's the difference between deepseek-chat and deepseek-reasoner after V4? V3.1 and V3.2 introduced unified thinking mode — a single model that switches between chain-of-thought and direct response via chat template. V4 is expected to continue this pattern. The deepseek-reasoner endpoint may be retired or re-pointed post-V4, but DeepSeek hasn't confirmed this. Watch the changelog.

Q: Do I need to re-tune prompts for V4? Likely minimal changes. The system prompt format and user/assistant turn structure haven't changed across any version transition. The area most likely to require prompt adjustment is reasoning-heavy tasks where V4's longer thinking traces produce more verbose intermediate outputs — you may want to adjust output parsing if you're extracting structured data from responses.

Q: What happened to V3.2-Speciale? V3.2-Speciale was served via a temporary endpoint and was available until December 15, 2025 — it achieved gold-medal results in IMO, CMO, ICPC World Finals, and IOI 2025, but required higher token usage, was API-only with no tool calls, and was intended for research use. It's no longer available. Its capabilities — relaxed length constraints and competition-level math performance — are expected to inform V4's training data.

Q: When exactly is V4 launching? The mid-February 2026 target window passed without a release. As of February 28, 2026, DeepSeek has not confirmed a new date. The February 11 silent expansion of the production API context window to 1M tokens is the most concrete signal of a staged rollout in progress. Community consensus: Q1–Q2 2026.

Q: Should I pin a specific model version to avoid silent upgrades? If output reproducibility matters for your use case — evals, benchmarks, automated testing — yes. Use deepseek-chat@2024-12-26 style versioned endpoint aliases if DeepSeek provides them, or route production traffic to a specific snapshot. For general-purpose workflows where "better is better," the silent upgrades have historically been improvements, not regressions.

From next article:

Hey, I’m Hanks — a workflow tinkerer and AI tool obsessive with over a decade of hands-on experience in automation, SaaS, and content creation. I spend my days testing tools so you don’t have to, breaking down complex processes into simple, actionable steps, and digging into the numbers behind “what actually works.”

Apply to become Macaron's first friends