Gemini 3.1 Pro Pricing: Full API Cost Breakdown vs Claude and GPT (2026)

Blog image

Hey fellow budget-watchers. If you are the person on your team who has to explain why the AI bill moved again, this one is for you.

This page needed a timing update.

Gemini 3.1 Pro is still listed in the official Gemini API pricing docs as gemini-3.1-pro-preview, but Google’s model lineup has moved forward. The current Gemini API docs now highlight Gemini 3.5 Flash as the “current” Gemini model family, while 3.1 Pro remains a preview model with its own pricing tier.

So the right question is no longer just “What did Gemini 3.1 Pro cost at launch?”

The better question is:

If you are still evaluating or running Gemini 3.1 Pro Preview in July 2026, what does it actually cost now, and how does that compare with current Claude and GPT pricing?

Here is the clean version.

Gemini 3.1 Pro API Pricing — Current Numbers

Blog image

The official Gemini API pricing page still lists:

gemini-3.1-pro-preview
gemini-3.1-pro-preview-customtools

It is a paid-tier-only model. There is no free API tier for Gemini 3.1 Pro Preview.

Standard Pricing

Token type

Prompt ≤ 200K tokens

Prompt > 200K tokens

Input

$2.00 / 1M tokens

$4.00 / 1M tokens

Output, including thinking tokens

$12.00 / 1M tokens

$18.00 / 1M tokens

Context caching

$0.20 / 1M tokens

$0.40 / 1M tokens

Cache storage

$4.50 / 1M tokens / hour

The 200K threshold matters.

If your prompt crosses 200K tokens, the request moves into the higher long-context tier. That means you should not treat the first 200K as cheap and only the overflow as expensive. For cost modeling, assume the whole large-context request changes tier.

Batch Pricing

Batch gives the usual async discount on input and output tokens:

Token type

Prompt ≤ 200K tokens

Prompt > 200K tokens

Batch input

$1.00 / 1M tokens

$2.00 / 1M tokens

Batch output

$6.00 / 1M tokens

$9.00 / 1M tokens

Context caching

Same as Standard

This is still the easiest win for workloads where the user is not waiting on the response.

Document classification, batch summarization, content enrichment, eval runs, overnight report generation: use Batch unless you specifically need real-time latency.

Flex Pricing

Google now lists Flex pricing for Gemini 3.1 Pro Preview as well:

Token type

Prompt ≤ 200K tokens

Prompt > 200K tokens

Flex input

$1.00 / 1M tokens

$2.00 / 1M tokens

Flex output

$6.00 / 1M tokens

$9.00 / 1M tokens

Context caching

Same as Standard

In plain English: Flex is priced like Batch for this model, but it is meant for lower-cost serving with more variable latency.

If your workload can tolerate that, Flex is worth testing before you accept Standard pricing as your baseline.

Priority Pricing

Priority is the expensive lane:

Token type

Prompt ≤ 200K tokens

Prompt > 200K tokens

Priority input

$3.60 / 1M tokens

$7.20 / 1M tokens

Priority output

$21.60 / 1M tokens

$32.40 / 1M tokens

Priority context caching

$0.36 / 1M tokens

$0.72 / 1M tokens

Priority cache storage

$8.10 / 1M tokens / hour

I would not use Priority as the default cost model.

Use it for latency-sensitive workloads where waiting costs you more than the token premium. For normal production analysis, coding, research, or document workflows, Standard, Batch, or Flex are the numbers that matter.

What Changed Since the Earlier Version

The old version of this article treated Gemini 3.1 Pro as the newest headline model. That is no longer the right framing.

As of July 1, 2026:

Gemini 3.1 Pro is still listed as a preview API model.
Gemini 3.5 Flash is now the current highlighted Gemini family in the docs.
Gemini 3.1 Pro pricing remains expensive compared with Flash and Flash-Lite models.
Context caching is listed at $0.20 / 1M tokens for ≤200K prompts and $0.40 / 1M tokens for >200K prompts.
Batch and Flex cut input/output token pricing by 50%, but caching stays at the standard cache rate.
Google Search and Maps grounding include 5,000 prompts/requests per month shared across Gemini 3, then $14 / 1,000 search queries.

That last part is easy to miss. If you use grounding heavily, your bill is not just input plus output tokens.

Gemini 3.1 Pro vs Gemini 3.5 Flash

This is now the first comparison I would make.

Model

Input / 1M

Output / 1M

Context caching

Batch input

Batch output

Gemini 3.1 Pro Preview

$2.00

$12.00

$0.20

$1.00

$6.00

Gemini 3.5 Flash

$1.50

$9.00

$0.15

$0.75

$4.50

For standard-context workloads, Gemini 3.5 Flash is 25% cheaper than Gemini 3.1 Pro on input and output.

So if your article still says “Gemini 3.1 Pro is the obvious value choice inside Google’s lineup,” I would soften that. The current value question is workload-specific:

Use Gemini 3.1 Pro Preview if you specifically need its preview reasoning/profile.
Test Gemini 3.5 Flash first if you need a cheaper default model with strong current support.
Use Gemini 3.1 Flash-Lite if your workload is high-volume and simpler.

Gemini 3.1 Pro vs Gemini 3.1 Flash-Lite

This is where the cost gap gets large.

Model

Input / 1M

Output / 1M

Best fit

Gemini 3.1 Pro Preview

$2.00

$12.00

Complex reasoning, heavy multimodal, agentic work

Gemini 3.1 Flash-Lite

$0.25

$1.50

High-volume classification, translation, extraction, simple agents

Flash-Lite is 8x cheaper on both input and output.

That does not mean it replaces Pro. It means you should not send every request to Pro.

The better architecture is usually routing:

cheap model for extraction, cleanup, classification, tagging.
Pro model for hard synthesis, planning, coding, or complex multimodal reasoning.
Batch/Flex for async jobs.
caching for repeated context.

The model choice matters. The routing policy matters more.

Gemini 3.1 Pro vs Claude

Blog image

Anthropic’s current Claude pricing page has moved since the earlier version of this article.

Current reference points:

Model

Input / 1M

Output / 1M

Notes

Gemini 3.1 Pro Preview

$2.00

$12.00

Higher tier after 200K prompt tokens

Claude Sonnet 5

$2.00

$10.00

Intro pricing through Aug 31, 2026

Claude Sonnet 5, standard from Sep 1

$3.00

$15.00

Scheduled price increase

Claude Sonnet 4.6

$3.00

$15.00

1M context at standard pricing

Claude Opus 4.8

$5.00

$25.00

Top Claude tier

This changes the old comparison.

Gemini 3.1 Pro is not always cheaper than Claude anymore. Against Claude Sonnet 5 intro pricing, Gemini is the same on input and more expensive on output.

Against Claude Sonnet 4.6 or post-intro Sonnet 5 pricing, Gemini is cheaper.

Against Opus 4.8, Gemini is much cheaper.

The context detail also changed. Anthropic now lists 1M token context at standard pricing for Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 5, and Sonnet 4.6. So Gemini’s “1M context” is no longer a unique pricing advantage by itself.

The real comparison is now capability plus workflow cost, not just context window size.

Gemini 3.1 Pro vs GPT

OpenAI’s current API pricing page has also moved on from the older GPT-5.2 comparison.

Blog image

Current flagship reference points include:

Model

Short-context input

Short-context output

Long-context input

Long-context output

Gemini 3.1 Pro Preview

$2.00

$12.00

$4.00

$18.00

GPT-5.4

$2.50

$15.00

$5.00

$22.50

GPT-5.5

$5.00

$30.00

$10.00

$45.00

GPT-5.5 Pro

$30.00

$180.00

$60.00

$270.00

Against GPT-5.4, Gemini 3.1 Pro is cheaper.

Against GPT-5.5, Gemini is much cheaper.

Against GPT-5.5 Pro, they are not in the same pricing class.

The more practical note: OpenAI has cheaper mini/nano tiers, and Google has cheaper Flash/Flash-Lite tiers. If your workload does not need a frontier/pro model, compare the smaller models first. That is where most avoidable spend lives.

Real-World Cost Estimates

Let’s keep the math simple.

Assume 50M tokens/month:

15M input tokens.
35M output tokens.
Standard context.
No Batch, Flex, or caching.

Model

Monthly cost

Gemini 3.1 Pro Preview

$30 input + $420 output = $450

Gemini 3.5 Flash

$22.50 input + $315 output = $337.50

Gemini 3.1 Flash-Lite

$3.75 input + $52.50 output = $56.25

Claude Sonnet 5 intro

$30 input + $350 output = $380

Claude Sonnet 4.6

$45 input + $525 output = $570

Claude Opus 4.8

$75 input + $875 output = $950

GPT-5.4

$37.50 input + $525 output = $562.50

GPT-5.5

$75 input + $1,050 output = $1,125

That table is the real update.

Gemini 3.1 Pro is still cost-competitive against high-end GPT and Claude Opus models. But inside Google’s own lineup, Gemini 3.5 Flash and 3.1 Flash-Lite are the models that change the monthly bill fastest.

Context Caching Still Matters

Context caching is still one of the cleanest ways to reduce spend when you reuse large context.

Current Gemini 3.1 Pro Preview cache pricing:

Cache type

Prompt ≤ 200K

Prompt > 200K

Context caching

$0.20 / 1M tokens

$0.40 / 1M tokens

Storage

$4.50 / 1M tokens / hour

The old mental model was “pay full input every time.”

That gets expensive fast.

If you have a repeated 50K-token system prompt, policy document, codebase summary, customer handbook, or reference file, caching can cut the repeated input portion sharply.

The catch is storage. If you keep a large cache alive for many hours, storage becomes part of the equation. Do the math on actual cache duration, not just read price.

Grounding Costs

Google’s pricing page now makes the grounding cost clearer:

5,000 Google Search prompts/requests per month are included, shared across Gemini 3.
After that, Google Search grounding is $14 / 1,000 search queries.
Google Maps grounding follows the same listed $14 / 1,000 query pattern.
A single user request may trigger more than one search query.

If your product uses grounded answers heavily, do not hide this under “token cost.”

Grounding can become its own line item.

Is Gemini 3.1 Pro Pricing Still Good?

Yes, but the answer is narrower than before.

Gemini 3.1 Pro Preview is still good pricing if you need:

Pro-level multimodal reasoning.
1M context.
strong agentic or coding behavior.
lower pricing than GPT-5.4, GPT-5.5, or Claude Opus.
Batch or Flex-friendly async workloads.

It is not the automatic default if you need:

cheapest high-volume processing.
simple extraction or classification.
latency-tolerant batch jobs.
workloads that Gemini 3.5 Flash handles well enough.
workloads where Claude Sonnet 5 intro pricing performs better in your evals.

I stopped here because this is the trap with pricing pages. You want one winner. The billing system does not care about clean narratives.

The better answer is a routing table.

Practical Routing Recommendation

Here is the routing setup I would test:

Workload

First model to test

Simple extraction / classification

Gemini 3.1 Flash-Lite

General app assistant

Gemini 3.5 Flash

Complex reasoning / multimodal analysis

Gemini 3.1 Pro Preview

Long repeated reference context

Gemini 3.1 Pro + context caching

Async bulk processing

Batch or Flex

Latency-critical work

Standard or Priority, only if justified

Claude-specific preference / writing / tool behavior

Claude Sonnet 5 or Opus

High-stakes frontier reasoning

Test against GPT-5.5 / Claude Opus / Gemini Pro

That is less tidy than “Gemini is cheaper.”

It is also how the bill actually gets smaller.

FAQ

Does Gemini 3.1 Pro still cost $2 input and $12 output per million tokens? Yes, for standard requests with prompts up to 200K tokens. For prompts over 200K tokens, pricing rises to $4 input and $18 output per 1M tokens.

Is Gemini 3.1 Pro still the latest Gemini model? No. As of July 1, 2026, Google’s docs highlight Gemini 3.5 Flash as the current Gemini model family. Gemini 3.1 Pro remains listed as a preview model.

Is there a free API tier for Gemini 3.1 Pro? No. Google’s pricing page lists the free tier as not available for gemini-3.1-pro-preview.

Does Batch API cut Gemini 3.1 Pro pricing in half? For input and output tokens, yes. Batch pricing is $1/$6 for prompts up to 200K and $2/$9 above 200K. Context caching remains at the standard cache price.

What is Flex pricing for Gemini 3.1 Pro? Flex is also listed at $1/$6 for prompts up to 200K and $2/$9 above 200K, with context caching priced the same as Standard.

Is Gemini 3.1 Pro cheaper than Claude? It depends which Claude model. It is cheaper than Claude Opus 4.8 and Claude Sonnet 4.6. It is not cheaper than Claude Sonnet 5 intro pricing on output tokens.

Is Gemini 3.1 Pro cheaper than GPT? It is cheaper than GPT-5.4 and GPT-5.5 on the current official pricing tables. But OpenAI mini/nano models may be cheaper for simpler workloads.

What should teams update in older cost models? Replace GPT-5.2 comparisons with current GPT-5.4 / GPT-5.5 pricing, add Claude Sonnet 5 intro pricing, add Gemini 3.5 Flash as the internal Google comparison, and separate Standard, Batch, Flex, and Priority pricing.

Bottom Line on Gemini 3.1 Pro Pricing

Gemini 3.1 Pro Preview is still a strong price-performance option for Pro-tier workloads, but it is no longer the whole Gemini pricing story.

The current takeaway is:

Gemini 3.1 Pro Preview: $2/$12 standard, $4/$18 long-context.
Batch and Flex: $1/$6 standard, $2/$9 long-context.
Priority: expensive, only for latency-sensitive work.
Gemini 3.5 Flash: cheaper default for many current workloads.
Gemini 3.1 Flash-Lite: the high-volume cost saver.
Claude Sonnet 5 intro pricing changes the Claude comparison.
GPT-5.4 and GPT-5.5 replace the old GPT-5.2 comparison.

At Macaron, we turn model cost comparisons into structured workflow tests, so you can see what each model actually delivers on your tasks before you commit the budget. Try one real workflow and judge the result against the bill.