Claude Sonnet 4.6 Review: Faster, Better Tool Use — Still Below Opus 4.5

Claude Sonnet 4.6 Is Out — Here's What Actually Changed

Anthropic officially announced the release of Claude Sonnet 4.6, and yes—this time, there are real improvements.

Here's the official announcement video:

https://x.com/claudeai/status/2023817132581208353

If you want the short version:

Sonnet 4.6 is clearly better than Sonnet 4.5, especially for coding, writing, analysis, research, and spreadsheet-heavy workflows in Cowork.

But it does not surpass Opus 4.5. It's faster and more capable with tools, not fundamentally smarter.

The good news: pricing remains unchanged.

• Input: $3 / million tokens

• Output: $15 / million tokens

Same as Sonnet 4.5.

Benchmark Results: A Meaningful, but Incremental Upgrade

Anthropic shared internal benchmark comparisons showing that Sonnet 4.6 outperforms Sonnet 4.5 across nearly all evaluated tasks.

The gains are not marginal—they're noticeable.

In particular, Anthropic emphasized improvements in "computer use" tasks, such as:

• Complex Excel workflows

• Web interaction and UI navigation

According to their data, these capabilities improved by roughly 18% compared to Sonnet 4.5, reaching what Anthropic describes as human-level performance in specific tool-driven scenarios.

Excel Plugin: A Practical Addition

Alongside the model release, Anthropic introduced a new Excel plugin powered by Sonnet 4.6.

You can now:

• Work directly inside Excel

• Ask questions about tables and formulas

• Let Sonnet 4.6 assist with spreadsheet logic and transformations

This isn't flashy—but it's genuinely useful for knowledge workers.

Availability: Where You Can Use Sonnet 4.6 Today

Sonnet 4.6 is already live across:

• Claude web (claude.ai)

• Cowork

• Claude Code

• Claude API

After refreshing the Claude web UI, Sonnet 4.6 appears as the default model.

That said, Claude Code required an update. Older versions were still pinned to Sonnet 4.5. After upgrading to Claude Code v2.1.45, Sonnet 4.6 became available.

"Near Opus-Level Intelligence"? Not Quite.

Claude Code's creator, Boris, mentioned during promotion that Sonnet 4.6 is close to Opus-level intelligence at a lower cost.

This needs clarification.

What this really means is: Sonnet 4.6 is closer to Opus 4.5 than Sonnet 4.5 was. It is not smarter than Opus 4.5. That distinction matters.

Pricing: Still the Same (and That's a Good Thing)

On OpenRouter, Sonnet 4.6 pricing matches Sonnet 4.5 exactly:

• 1M context window

• $3 / million input tokens

• $15 / million output tokens

The Sonnet line has held this pricing consistently since Sonnet 3.7.

Real-World Testing: Mixed Reactions

Interestingly, the top comment under Anthropic's announcement pointed out that Sonnet 4.6 still fails the classic "car wash" reasoning problem.

After testing it myself—I saw the same issue. The model suggests walking to the car wash while leaving the car at home. That's… not great.

Other users echoed similar sentiments:

• No qualitative leap in reasoning

• Programming ability still below Opus 4.5

• Most improvements are in Cowork-style execution, not intelligence

Some even joked that Claude releases are starting to feel like incremental iPhone upgrades—predictable, polished, but not exciting.

When Cursor added Sonnet 4.6, they explicitly noted the same thing: It can run longer tasks more reliably, but its intelligence level has not fundamentally changed.

Why the Disappointment Feels Stronger This Time

Most major AI labs release at least two tiers per generation:

• A standard model

• A more powerful, reinforced model

For example, Google's Gemini Flash models often outperform the previous Pro generation.

So when Sonnet 4.6 failed to catch up to Opus 4.5, user expectations fell flat. Even Opus 4.6, released recently, comes with a trade-off: higher intelligence, noticeably slower execution.

What Actually Matters to Most Users

For most people, model quality comes down to three factors:

• Price

• Speed

• Intelligence

Very few models do well on all three.

Right now, one notable exception is MiniMax M2.5. At less than 1/20th the cost of Opus 4.6, it delivers roughly 80% of Opus-level capability—which makes it extremely hard to ignore.

In my own workflow:

• Planning and high-level reasoning → Claude

• Execution and iteration → MiniMax M2.5

It's cheaper, faster, and more than capable for most tasks.

Meanwhile, Elsewhere in the Model World…

In a much quieter corner, xAI released Grok 4.2—and almost no one noticed.

Grok still struggles with reasoning and general usability. Its only real advantages:

• NSFW content support

• Direct access to X (Twitter) data

And that's about it.

How to Use Claude Sonnet 4.6 Today

For General Users

Web / App

• On claude.ai, Sonnet 4.6 is now the default

• iOS and Android apps support it for both Free and Pro users

Best Use Cases

• Complex programming and debugging

• Large codebase review

• Long-form writing, resumes, reports

• Multi-step planning and analysis

• Enterprise document Q&A (near flagship-level performance)

Prompting Tips

Be explicit about: Role, Task goal, Output format.

Example: "You are a senior frontend engineer. Refactor the following React code to improve readability and performance. Return the revised code and explain the changes."

For complex tasks, explicitly ask for step-by-step reasoning before the final answer.

For Developers (API Usage)

Where It's Available

• Messages API (Claude Developer Platform)

• Amazon Bedrock

• Google Vertex AI

• Microsoft Foundry

Typical API Setup (Pseudo-Code)

• Model: claude-sonnet-4.6

• Parameters: max_tokens, thinking / effort (controls reasoning budget), messages (system / user / assistant)

Large Context (Up to 1M Tokens, Beta)

Ideal for:

• Full PRDs + design docs + code in one pass

• Multi-paper or multi-report synthesis

• Large-scale internal knowledge analysis

New Capabilities for Apps

• Adaptive thinking and context compression

• More stable tool selection and error recovery

• Better support for agent-style workflows (browse → calculate → write)

Final Thoughts

Claude Sonnet 4.6 is a solid, professional upgrade. It's faster. It handles tools better. It's more reliable in long workflows. But it's not a breakthrough.

As competition intensifies, expectations rise. Users want models that are:

• Affordable

• Fast

• Genuinely intelligent

• Reliable at execution, not just conversation

Hopefully, by 2026, we'll see a real leap—not just another incremental step.