DeepSeek V4 Is Out: What Actually Shipped in 2026

Blog image

It's been a long wait. DeepSeek V4 was delayed three times from its original mid-February target, and by last week, a lot of people had basically stopped expecting it to actually land. Then today it dropped — without much warning, as usual.

Here's the short version of what happened, and what it means if you're not a developer.


What DeepSeek Officially Announced

DeepSeek unveiled the V4 Flash and V4 Pro series, touting top-tier performance in coding benchmarks and what they described as big advances in reasoning and agentic tasks. The announcement went up on Hugging Face and their social accounts simultaneously — Bloomberg confirmed the release within hours of the Hugging Face drop.

DeepSeek-V4-Pro comes with 1.6 trillion total parameters — though only 49 billion activate per token — while V4-Flash runs a smaller 284 billion parameters with 13 billion activated, focused on speed and cost. Both models share the same headline feature: a one million token context window, which is a major jump from what V3 offered.

Blog image

It's a preview release, not a final version. Worth keeping that in mind.


Access: Chat App, Web, and API

Blog image

DeepSeek confirmed the V4 Preview is available now on web, app, and API. The consumer app remains free — the chat app and web product are free for casual end-users, while the API is billed by token usage.

Blog image

The API now supports deepseek-v4-pro and deepseek-v4-flash with 1M context windows, and the old model names deepseek-chat and deepseek-reasoner will be retired by July 24, 2026 — they currently route to deepseek-v4-flash.


What Shipped vs What Was Rumored

A lot of pre-release coverage got things wrong. Worth clearing up before you go looking for features that don't exist.

V4-Pro and V4-Flash (Not "V4 Lite")

Pre-release leaks consistently used the name "V4-Lite" for the smaller model. That name never appeared in the official release. What actually shipped is V4-Flash — a name that better signals its purpose: fast, affordable, not lite.

V4-Pro is the full-capability version. DeepSeek says its performance falls only marginally short of OpenAI's GPT-5.4 and Gemini 3.1-Pro, describing a developmental trajectory roughly 3 to 6 months behind state-of-the-art frontier models. That's a self-assessment, and worth reading with appropriate skepticism — but it's not a wildly implausible claim based on the benchmarks shown.

Blog image

V4-Flash is the one most free-tier users will interact with. It has similar reasoning abilities to V4-Pro, with faster response times and more cost-effective pricing.

1M Context, Confirmed

DeepSeek confirmed the 1 million token context window — a leap that allows entire codebases or long documents to be sent as a single prompt.

For regular users, this mostly means the model can handle very long conversations, large uploaded documents, or lengthy research threads without losing track of earlier details. You probably won't hit the limit doing everyday things.

DSA Sparse Attention (Not "Engram")

This is the one that tripped up a lot of early coverage, including some of ours. Pre-release speculation heavily focused on "Engram," a memory architecture DeepSeek published research on in January. The Hunter Alpha model that briefly appeared on OpenRouter and was thought to be an early V4 test was later confirmed to be Xiaomi's MiMo-V2-Pro, not DeepSeek V4.

The actual architecture DeepSeek shipped uses a pioneering DSA sparse attention mechanism — as detailed in the official DeepSeek V4 model card — which compresses at the token level and significantly reduces the computational and memory demands of ultra-long context processing. Engram isn't what powers the final model. DSA is. Different thing entirely.

Blog image


What This Means for Regular Users

Developers already know what to do with this. For everyone else, here's the practical version.

What You Get in the Free Chat App

If you're using DeepSeek through the website or the app, you'll have access to V4 — most likely V4-Flash by default, since that's what the old deepseek-chat model now routes to. You get the 1M context window, the new thinking mode (which lets the model reason through problems step by step before answering), and what feels noticeably like a more capable model than V3.

The thinking mode is worth trying on anything that actually requires reasoning — the API exposes a reasoning_effort parameter that lets users adjust thinking intensity for complex tasks like code generation and document processing. In the consumer app, you'll likely see a simpler toggle.

Blog image

Where It Still Falls Short

No model does everything. On long-context retrieval benchmarks, V4-Pro scores below Claude Opus 4.6, and on broader knowledge-work metrics the closed frontier models still lead. For everyday chat and writing tasks this won't matter. For research-heavy or document-intensive use, it's worth knowing.

There's also the ongoing privacy question, which hasn't changed. DeepSeek's privacy policy identifies a Chinese company as the data controller and states that personal data — including prompts, uploads, and chat history — may be collected and processed in China. If that matters to you — and it reasonably might — it's worth factoring in before you use it for sensitive work.


Limitations and Things to Watch

Availability Outside China

A major question following the V4 release is which chips were used to train the model. Huawei confirmed its Ascend AI processors can support V4, but it's unclear to what extent they were used compared to Nvidia's. For users outside China, the practical question is simpler: can you access it?

Based on current reports, the web and app access appears to be globally available — signup works with any email. That said, some countries that previously restricted DeepSeek-R1 may apply the same restrictions here. Worth checking your local app store if the mobile version isn't showing up.

Unverified Benchmark Claims

DeepSeek's self-reported benchmarks are impressive. They also come from DeepSeek. Independent evaluation takes time, and the model only dropped hours ago. The claims around coding performance and math reasoning will get tested properly in the coming days — a healthy reason not to make big decisions based purely on the launch announcement.


FAQ

Is DeepSeek V4 free to use?

Yes, for the consumer chat app and web version. DeepSeek is available at no cost for general users, while the API version charges by token usage for developers and businesses.

What's the difference between V4-Pro and V4-Flash?

V4-Pro is the higher-capability model aimed at complex tasks — coding, reasoning, long documents. V4-Flash trades some capability for speed and lower cost. For casual everyday use, you likely won't notice a meaningful difference. For heavy workloads, Pro is the one to reach for if you have API access.

Does DeepSeek V4 remember me across chats?

No — not in the way some AI companions are starting to. The 1M context window means it can hold a very long single conversation in memory, but once you close the session and start a new one, it doesn't carry anything over. It doesn't learn who you are over time. Each conversation starts fresh.

This is actually a growing differentiator in the AI space. Some newer apps are building persistent memory that carries across sessions — the model getting to know you over days and weeks, not just within a single chat. DeepSeek V4, for all its technical capability, doesn't do that yet.

Is the mobile app available outside China?

DeepSeek's app has been available globally, though availability can vary by region and platform. If you're having trouble finding it in your local App Store or Google Play, the web version at deepseek.com is a reliable fallback.


It's been a long lead-up for a model that's launching as a preview rather than a final release. Maybe I'm overthinking it, but that framing matters — preview means the spec can still shift, benchmarks are self-reported, and the real test is what independent evaluators find over the next few weeks.

Worth trying if you're curious. Worth waiting a bit if you're making decisions that depend on it.


Recommended Reads

AI Fitness Coach: What It Is and Whether It Works

Life Organizer App: How to Find One That Fits

AI Journal App: What to Look for and What's Worth It

Three years in creative consulting, which mostly means I've tried every productivity system out there and abandoned most of them within a week. I'm not undisciplined. I just figured out early that most tools aren't really built for the way my brain works — and once I accepted that, things got a lot quieter. I write about what actually helps. Not for everyone. Just maybe for you.

Apply to become Macaron's first friends