
Anthropic officially announced the release of Claude Sonnet 4.6, and yes—this time, there are real improvements.
Here's the official announcement video:
https://x.com/claudeai/status/2023817132581208353
If you want the short version:
Sonnet 4.6 is clearly better than Sonnet 4.5, especially for coding, writing, analysis, research, and spreadsheet-heavy workflows in Cowork.
But it does not surpass Opus 4.5. It's faster and more capable with tools, not fundamentally smarter.
The good news: pricing remains unchanged.
• Input: $3 / million tokens
• Output: $15 / million tokens
Same as Sonnet 4.5.
Anthropic shared internal benchmark comparisons showing that Sonnet 4.6 outperforms Sonnet 4.5 across nearly all evaluated tasks.
The gains are not marginal—they're noticeable.
In particular, Anthropic emphasized improvements in "computer use" tasks, such as:
• Complex Excel workflows
• Web interaction and UI navigation
According to their data, these capabilities improved by roughly 18% compared to Sonnet 4.5, reaching what Anthropic describes as human-level performance in specific tool-driven scenarios.
Alongside the model release, Anthropic introduced a new Excel plugin powered by Sonnet 4.6.
You can now:
• Work directly inside Excel
• Ask questions about tables and formulas
• Let Sonnet 4.6 assist with spreadsheet logic and transformations
This isn't flashy—but it's genuinely useful for knowledge workers.
Sonnet 4.6 is already live across:
• Claude web (claude.ai)
• Cowork
• Claude API
After refreshing the Claude web UI, Sonnet 4.6 appears as the default model.
That said, Claude Code required an update. Older versions were still pinned to Sonnet 4.5. After upgrading to Claude Code v2.1.45, Sonnet 4.6 became available.
Claude Code's creator, Boris, mentioned during promotion that Sonnet 4.6 is close to Opus-level intelligence at a lower cost.
This needs clarification.
What this really means is: Sonnet 4.6 is closer to Opus 4.5 than Sonnet 4.5 was. It is not smarter than Opus 4.5. That distinction matters.
On OpenRouter, Sonnet 4.6 pricing matches Sonnet 4.5 exactly:
• 1M context window
• $3 / million input tokens
• $15 / million output tokens
The Sonnet line has held this pricing consistently since Sonnet 3.7.
Interestingly, the top comment under Anthropic's announcement pointed out that Sonnet 4.6 still fails the classic "car wash" reasoning problem.
After testing it myself—I saw the same issue. The model suggests walking to the car wash while leaving the car at home. That's… not great.
Other users echoed similar sentiments:
• No qualitative leap in reasoning
• Programming ability still below Opus 4.5
• Most improvements are in Cowork-style execution, not intelligence
Some even joked that Claude releases are starting to feel like incremental iPhone upgrades—predictable, polished, but not exciting.
When Cursor added Sonnet 4.6, they explicitly noted the same thing: It can run longer tasks more reliably, but its intelligence level has not fundamentally changed.
Most major AI labs release at least two tiers per generation:
• A standard model
• A more powerful, reinforced model
For example, Google's Gemini Flash models often outperform the previous Pro generation.
So when Sonnet 4.6 failed to catch up to Opus 4.5, user expectations fell flat. Even Opus 4.6, released recently, comes with a trade-off: higher intelligence, noticeably slower execution.
For most people, model quality comes down to three factors:
• Price
• Speed
• Intelligence
Very few models do well on all three.
Right now, one notable exception is MiniMax M2.5. At less than 1/20th the cost of Opus 4.6, it delivers roughly 80% of Opus-level capability—which makes it extremely hard to ignore.
In my own workflow:
• Planning and high-level reasoning → Claude
• Execution and iteration → MiniMax M2.5
It's cheaper, faster, and more than capable for most tasks.
In a much quieter corner, xAI released Grok 4.2—and almost no one noticed.
Grok still struggles with reasoning and general usability. Its only real advantages:
• NSFW content support
• Direct access to X (Twitter) data
And that's about it.
For General Users
Web / App
• On claude.ai, Sonnet 4.6 is now the default
• iOS and Android apps support it for both Free and Pro users
Best Use Cases
• Complex programming and debugging
• Large codebase review
• Long-form writing, resumes, reports
• Multi-step planning and analysis
• Enterprise document Q&A (near flagship-level performance)
Prompting Tips
Be explicit about: Role, Task goal, Output format.
Example: "You are a senior frontend engineer. Refactor the following React code to improve readability and performance. Return the revised code and explain the changes."
For complex tasks, explicitly ask for step-by-step reasoning before the final answer.
For Developers (API Usage)
Where It's Available
• Messages API (Claude Developer Platform)
• Microsoft Foundry
Typical API Setup (Pseudo-Code)
• Model: claude-sonnet-4.6
• Parameters: max_tokens, thinking / effort (controls reasoning budget), messages (system / user / assistant)
Large Context (Up to 1M Tokens, Beta)
Ideal for:
• Full PRDs + design docs + code in one pass
• Multi-paper or multi-report synthesis
• Large-scale internal knowledge analysis
New Capabilities for Apps
• Adaptive thinking and context compression
• More stable tool selection and error recovery
• Better support for agent-style workflows (browse → calculate → write)
Claude Sonnet 4.6 is a solid, professional upgrade. It's faster. It handles tools better. It's more reliable in long workflows. But it's not a breakthrough.
As competition intensifies, expectations rise. Users want models that are:
• Affordable
• Fast
• Genuinely intelligent
• Reliable at execution, not just conversation
Hopefully, by 2026, we'll see a real leap—not just another incremental step.