
Hey fellow budget-watchers — if you're the person on your team who has to explain why the AI bill jumped last month, this one's for you.
I've been running cost comparisons across frontier models every time there's a major release. When Gemini 3.1 Pro dropped February 19, the pricing story was actually the most interesting part — and not for the reason you'd expect. I'm Hanks, and my whole thing is testing tools inside real workflows, not sales pages. Let me walk you through exactly what Gemini 3.1 Pro pricing looks like, where it actually saves you money, and where the math isn't as clean as the headline suggests.

The core fact first: Gemini 3.1 Pro launched at identical pricing to Gemini 3 Pro. If you were already running Gemini 3 Pro in production, updating your model ID costs you literally nothing.
Here's the full pricing structure sourced directly from the Gemini API pricing page and confirmed by the Gemini 3 developer guide:
This applies to prompts under 200,000 tokens — which covers the majority of real-world API calls.
One important note for teams tracking token budget: thinking tokens are billed as output at the standard $12/M rate. When you set thinking_level to High — which activates Deep Think Mini mode — your output token count can increase meaningfully depending on problem complexity. That's not hidden cost, it's the model doing more work. But build it into your estimates before you go to production.
Cross the 200K input token threshold and all tokens — input and output — move to the long-context tier. This is the part that catches teams off guard.
The threshold applies to your total input token count, not just the portion that exceeds 200K. Send 210K input tokens, and your entire request — including all output — is billed at the higher rate.
For most RAG pipelines and standard document workflows, you'll stay comfortably under 200K. If you're loading full codebases or multi-hour transcripts — which the 1M context window makes possible — factor the 2x jump into your monthly estimates.
Gemini 3.1 Pro supports the Gemini Batch API, which cuts every token price in half in exchange for asynchronous processing (typically within 24 hours).
This is a no-brainer for any workload that isn't real-time. Data enrichment, document classification, overnight report generation, large-scale content processing — if the user isn't waiting on the response, there's no reason not to use Batch.
Combine Batch with context caching and you can push effective input costs toward $0.10–0.20 per million tokens on repeated contexts. That changes the math significantly at production scale.
Here's the full three-way pricing comparison using verified data as of February 2026.

Claude Opus 4.6, released February 4, 2026, is Anthropic's most capable model at $5 per million input tokens and $25 per million output tokens. That's 2.5x the input price and roughly 2x the output price of Gemini 3.1 Pro.
At 50M output tokens/month — a realistic production workload — that delta is $650,000/year in your favor with Gemini 3.1 Pro. That number changes the conversation.
The honest counter: Claude Opus 4.6 still leads on expert task preference benchmarks (GDPval-AA Elo: 1606 vs 3.1 Pro's 1317) and on Terminal-Bench 2.0 computer use tasks. If those specific capabilities are core to your product, the premium may be justified. If you're running general document analysis, coding review, or agentic workflows where benchmark differences are marginal, Gemini 3.1 Pro's cost advantage is hard to ignore.
Claude also has batch pricing (50% off) and prompt caching that can bring effective input costs down to $0.50/M. The gap narrows under heavy optimization — but doesn't close.
This is the more interesting comparison for most teams. Claude Sonnet 4.6, priced at $3/$15 per million tokens, is Anthropic's current default model — and as VentureBeat noted in their February coverage, it matches or approaches Opus 4.6 performance on most practical benchmarks.
SWE-Bench Verified is nearly identical. Gemini 3.1 Pro runs 33% cheaper on input and 20% cheaper on output. For development teams evaluating model cost at scale, this comparison matters more than the Opus head-to-head.
The context window is where Gemini pulls further ahead: Sonnet 4.6's 1M context is beta-only for Tier 4 organizations, while Gemini 3.1 Pro's 1M context is the standard default.
GPT-5.2, released December 2025, is priced at $1.75/$14.00 per million tokens — actually cheaper on input than Gemini 3.1 Pro, but more expensive on output.
For output-heavy workflows (long code generation, extended reports), Gemini 3.1 Pro wins by $2/M. For input-heavy workflows (large document analysis), GPT-5.2 is slightly cheaper. The 1M vs 400K context window difference is significant if you're actually pushing large-context use cases — Gemini has a real structural advantage there.
GPT-5.2 Pro at $21/$168 is a completely different tier — appropriate for tasks where accuracy has high financial consequences (legal review, compliance analysis), not for general production workloads.
Here's the full three-way comparison at a glance:
Let me put actual numbers on these pricing tiers, because per-million-token pricing only makes sense at scale.
Assumptions: 70% output-heavy use case, standard context window, no batch optimization.
Light Use — 1M tokens/month (300K input tokens + 700K output tokens)
At light use volumes, the differences are negligible. You're choosing based on capability and workflow fit, not cost.
Production — 50M tokens/month (15M input tokens + 35M output tokens)
Now the gaps matter. Gemini 3.1 Pro saves $120/month vs Sonnet 4.6, $500/month vs Opus 4.6, and comes close to GPT-5.2 despite having a larger context window as a default feature. At 200M tokens/month — where enterprise workloads start — these differences compound significantly.
Context caching is where Gemini's pricing story gets genuinely compelling. If your workflow involves a large, repeated context — a system prompt, a reference document, a code template — you pay the full $2/M input rate once for the cache write, then only $0.20/M for every subsequent cache read. That's a 90% reduction on the portion of your input that's cached.
Practical example: an API workflow with a 50K-token system prompt, called 10,000 times/month.
Without caching: 50K tokens × 10,000 calls = 500M tokens × $2/M = $1,000/month
With caching:
Net result: 90% reduction on that input segment. If your system prompt and reference documents represent 60–70% of your total input tokens, your effective input cost can approach $0.20–0.40/M on a blended basis — well below the headline $2/M.
The Google AI developer documentation confirms context caching is supported on all Gemini 3 models including 3.1 Pro.
At $2/$12 per million tokens, Gemini 3.1 Pro hits the intersection of three things that rarely align: frontier performance, large context window, and competitive pricing. For most teams building production AI systems, this combination is genuinely useful.
The performance case is solid. 77.1% on ARC-AGI-2 (up from 31.1% on Gemini 3 Pro), 80.6% on SWE-Bench Verified, 94.3% on GPQA Diamond — these aren't marginal improvements. The model outperforms Claude Opus 4.6 on most of those benchmarks at less than half the price.
The preview caveat matters though. Google notes that all Gemini 3 models are currently in preview, and pricing may adjust as the model moves to general availability. One source tracking Gemini pricing expects stable rates to settle around $1.50/$10 in Q2 2026 if the preview-to-production pattern holds — which would make the value proposition even stronger.
Where it's not the clear winner: Claude Opus 4.6 still leads on human expert task preferences and computer use benchmarks. GPT-5.2 is slightly cheaper on input-heavy workloads. If you have specific use cases where those characteristics dominate your performance criteria, run the evals on your actual task distribution rather than trusting headline benchmark numbers.
For most general-purpose production workloads? Gemini 3.1 Pro is the strongest case for value in frontier AI right now.
Does Gemini 3.1 Pro cost more than Gemini 3 Pro? No. Gemini 3.1 Pro launched at identical pricing: $2/$12 per million tokens for standard context, $4/$18 for long-context requests over 200K tokens. For existing Gemini 3 Pro users, it's a capability upgrade at zero additional cost.
Is there a free tier for Gemini 3.1 Pro? No. The Gemini 3 developer guide explicitly confirms there is no free tier for gemini-3.1-pro-preview in the Gemini API. You can try it free in Google AI Studio (with rate limits), but API calls require a paid account.
Does the Batch API apply to Gemini 3.1 Pro? Yes — confirmed in the official documentation. Batch API gives a flat 50% discount on all token types, making the effective rate $1/$6 per million tokens for standard-context async workloads.
How does thinking token billing work? Thinking tokens — the internal reasoning the model generates before producing a final response — are billed as output tokens at the standard $12/M rate. Setting thinking_level to High triggers the most reasoning depth and therefore the highest potential output token count. Monitor your actual output token volumes during testing before committing to High mode in production.
Will pricing change when Gemini 3.1 Pro leaves preview? Unknown officially — Google hasn't announced a timeline for general availability. Historically, Google has maintained or reduced prices when moving from preview to production. Budget conservatively on preview pricing for now.
How does context caching pricing work? Cache writes cost $0.50/M tokens, cache reads cost $0.20/M tokens, and storage costs $4.50/M tokens per hour. The key mechanic: you pay write price once, then read price on every subsequent hit. For repeated large contexts — system prompts, reference documents, code templates — the savings compound quickly.
Here's the actual decision framework: if you're currently running Gemini 3 Pro, update the model ID today. Same price, better model. There's no decision to make.
If you're evaluating Gemini 3.1 Pro against Claude or GPT for a new build: run your target task distribution through all three. At standard pricing, Gemini 3.1 Pro is the most cost-efficient frontier model with a full 1M context window by default. With Batch API and context caching, you can push effective costs down further than any competitor in this capability tier.
The preview status is the only real risk. Build in pricing flexibility for Q2 2026 when GA pricing is confirmed.
At Macaron, you can run your actual tasks through different models and turn the outputs into structured, trackable workflows — so your cost comparison is based on what each model actually delivers, not just benchmark tables. Try it free at macaron.im and run your own cost-performance test.