Gemini 3.1 Pro in Google AI Studio: A Beginner's Guide to Getting Started

Quick reality check before we start: if you've been staring at Gemini 3.1 Pro headlines since February 19, 2026 and thinking "okay but how do I actually use this thing without an API key or a PhD in prompt engineering" — this one's for you. I'm Hanks. I've been running new models through real workflows for three-plus years, and my first move with any new release is always the same: open AI Studio, break some prompts, and figure out what actually changed before touching the API. Here's exactly what I did with Gemini 3.1 Pro.


What Is Google AI Studio?

Google AI Studio is a browser-based workspace where you can talk to Gemini models, upload files, tweak model settings, and — when you're ready — export the whole thing as working Python, Node.js, or REST code. No local setup. No billing required to start.

Think of it as the scratchpad before the codebase. You figure out what prompt structure works, what context the model needs, and which thinking level makes sense for your task. Then you click "Get Code" and the heavy lifting is already done.

Why It's the Fastest Way to Try Gemini 3.1 Pro

The honest reason: because you can go from "I heard this model is interesting" to "I'm actually talking to it" in about ninety seconds. There's no npm install, no virtual environment, no key rotation. Sign in with a Google account and you're in.

For Gemini 3.1 Pro specifically, AI Studio is currently the fastest access point — it went into preview on February 19, 2026, and AI Studio was one of the first surfaces to support it. The Gemini CLI, Vertex AI, Android Studio, and Antigravity all have access too, but Studio requires the least setup. If something is going to fail (and it will, at least once), I'd rather it fail here than in a production pipeline.


Getting Started — Step by Step

Signing In and Finding Gemini 3.1 Pro

Go to aistudio.google.com. Sign in with a Google account. That's it for the auth side.

Once you're in, look for the model selector at the top of the prompt window. It will show whatever model was last selected — often a Gemini 2.5 Flash or similar. Click it. A dropdown appears. Scroll down and select Gemini 3.1 Pro Preview (gemini-3.1-pro-preview).

One edge case worth flagging: if you see gemini-3.1-pro-preview-customtools, that's a separate endpoint specifically tuned for pipelines where you're mixing bash commands with your own custom functions. The model on that endpoint is trained to prioritize your tools rather than falling back to bash. For learning purposes, start with the main gemini-3.1-pro-preview endpoint.

Running Your First Prompt

Nothing special here. Type a prompt and hit Run. What I'd suggest for a first run — something with actual reasoning pressure, not just a text generation task. Something like:

I have a Postgres database with ~50M rows in a user_events table. The table has user_id, event_type, and created_at. I want to find users who performed event_type='signup' but never did event_type='purchase' within 30 days of signup. Write the most efficient query and explain the index strategy.

On High thinking (the default), Gemini 3.1 Pro will visibly reason through the problem before responding. You'll see a "Thinking..." indicator while it works. This is the Deep Think Mini behavior — a lighter version of the reasoning system that pushed the model to 77.1% on ARC-AGI-2, more than double what Gemini 3 Pro scored on the same benchmark.

The response will come back with both the SQL and a clear explanation of why a composite index on (user_id, event_type, created_at) is the right call. This took about 12 seconds on High thinking. Compare that to what you'd get from a simpler model — same query, less coherent index rationale, probably a slower query too.

Uploading Files, Images, or Video

Gemini 3.1 Pro handles text, PDFs, images, audio, and video. In AI Studio, there's a paperclip icon next to the prompt bar. Click it. Upload. That's the whole interaction.

What's actually useful here: the model's 1 million token context window means you can upload a lengthy technical document, a full codebase export, or a complex PDF and ask questions across the entire thing without hitting a ceiling. For document-heavy workflows — legal review, research synthesis, codebase Q&A — this changes what's possible in a single session.

One practical tip: the model's handling of dense tables and charts can be inconsistent. If you're uploading something with complex nested tables, present key data in a clear, flat format when possible. The official Gemini 3 developer guide specifically flags this behavior.


Thinking Levels — Low, Medium, High

This is the part that confuses most people coming from other models. When you're in AI Studio, you'll see a Thinking Level slider or dropdown in the settings panel. Gemini 3.1 Pro is the first model in the Gemini 3 series to offer three options: Low, Medium, and High. (Gemini 3 Pro only had two.)

When to Use Each Setting

Here's the practical read, not the marketing version:

Low is for high-throughput, low-complexity tasks: text summarization, simple code completion, classification, Q&A where the answer is in the context. The model generates roughly 300 thinking tokens internally before responding. Fast. Cheap. Correct most of the time for simple things.

Medium is the setting I'd default to for most engineering work. Roughly 1,000–3,000 thinking tokens per request, 3–8 second response times. Code review, bug fixes, data analysis, writing tasks with real structure requirements. The interesting thing here is that MEDIUM on 3.1 Pro is described as roughly equivalent in reasoning depth to HIGH on the older Gemini 3 Pro — at lower cost and lower latency. It's not a compromise; it's a new level that didn't exist before.

High activates what Google is calling Deep Think Mini — the full reasoning chain, 5,000–20,000+ thinking tokens, response times that can exceed 60 seconds on genuinely hard problems. This is where you send complex multi-file debugging, algorithm design problems, or anything where getting it right on the first pass is more important than speed.

The selector in AI Studio applies your chosen level to every request in that session. To test all three back-to-back, open three tabs.

How It Affects Speed and Token Cost

Thinking tokens are billed as output tokens — $12 per million for contexts under 200K tokens. That matters because the High setting can generate 8,000+ thinking tokens before the visible response even starts.

Thinking Level
Avg. Thinking Tokens
Approx. Cost per Request
Best For
Low
~300
~$0.004
Autocomplete, summarization, classification
Medium
~2,000
~$0.024
Code review, analysis, most dev tasks
High
~8,000
~$0.096+
Complex debugging, research, architecture

For production usage, the 80/20 strategy that's been circulating in the developer community makes sense: route ~60% of requests to Low, ~30% to Medium, and only ~10% to High. On high-volume workloads, this can cut monthly thinking token costs by 70–75% compared to running everything at High.

One thing to know: the default in the API (not AI Studio) is High. If you're just experimenting in Studio, that's fine. If you move to the API, add thinking_level="medium" explicitly unless you have a specific reason to run High for everything.


Moving from AI Studio to the API

One-Click Code Export (Python / Node.js / REST)

This is one of those features that should get more attention than it does. Once you have a prompt in AI Studio that's doing what you want — right structure, right thinking level, useful output — click the Get Code button (usually top-right in the prompt panel).

A modal opens showing working code in your choice of Python, Node.js, or REST. The thinking level you tested with is already set. The model ID is already correct. You copy it, drop it in your project, add your API key, and it runs.

# Example: Gemini 3.1 Pro API call with medium thinking (Python)
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel(
    model_name="gemini-3.1-pro-preview",
    generation_config=genai.GenerationConfig(
        thinking_config=genai.ThinkingConfig(thinking_level="medium"),
        max_output_tokens=8192,
    ),
)
response = model.generate_content("Your prompt here")
print(response.text)

Note the max_output_tokens parameter. The model supports up to 65,536 output tokens, but the default is only 8,192. If you're doing long-form generation and getting truncated responses, explicitly setting this higher is the fix.


Limits on the Free AI Studio Tier

Here's where things require a bit of nuance. AI Studio itself is free to use — you can open it, experiment, and run prompts at no cost. But the free experience for Gemini 3.1 Pro specifically is more constrained than it sounds.

The key facts as of February 2026:

Gemini 3.1 Pro is a paid-tier-only model in the API. There is no free API tier for gemini-3.1-pro-preview. You can experiment with it in AI Studio without charge, but the moment you start making API calls, you need billing enabled.

Free tier models are Gemini 2.5 series. If you want a free API tier for development work, gemini-2.5-pro (5 RPM, 100 RPD) and gemini-2.5-flash (15 RPM, 1,000 RPD) are the current free options. Useful for prototyping — just know you'll need to swap the model ID when you go to production with 3.1 Pro. Note: Gemini 2.0 Flash is being retired March 3, 2026 — if any of your existing code references it, update now.

Tier structure for paid access:

Tier
Requirement
Rate Limits (approx.)
Free
No billing
AI Studio only; no API tier for 3.1 Pro
Tier 1 (Paid)
Enable billing
150–300 RPM, 1M TPM, 1,500 RPD
Tier 2
$250 cumulative spend + 30 days
500–1,500 RPM, 2M TPM
Tier 3
$1,000 spend or sales contact
Custom limits

For individual developers and small teams just getting started: enable billing, move to Tier 1, and the limits are workable. Tier 1 also enables context caching (75% discount on cached input reads), which matters a lot if you're repeatedly querying the same large codebase or document set.


At Macaron, we see the same pattern constantly: people experiment in AI Studio, nail a prompt that works, then lose context when they try to connect that output to the next step in their workflow. If you want to test whether your stable prompts can actually drive a repeatable task — not just a one-off conversation — try running a real task end-to-end at Macaron and judge for yourself whether the output holds up.


Frequently Asked Questions

Do I need a credit card to try Gemini 3.1 Pro in AI Studio? No. AI Studio experimentation is free. You only need billing enabled when you start making Gemini API calls programmatically.

What's the model ID I should use in my code?gemini-3.1-pro-preview for the standard model. Use gemini-3.1-pro-preview-customtools if your agent mixes bash commands with your own custom function definitions and the model keeps defaulting to bash.

Can I use my old thinking_budget code? Yes — it's backward compatible. But Google recommends migrating to thinking_level for more predictable performance. Don't use both parameters in the same request; that returns a 400 error.

Why is my response getting cut off? The default max_output_tokens is 8,192. If you're generating long responses, explicitly set it higher — up to 65,536. You can do this in AI Studio via the settings panel or in your API call as shown in the code example above.

Is Gemini 3.1 Pro generally available? As of February 26, 2026, it's in preview. Google has said GA is coming "soon." Preview pricing and behavior may change before the stable release — check the official pricing page before committing to long-term cost projections.

What's the context window? 1 million tokens input, up to 65,536 tokens output. If your total input exceeds 200,000 tokens, the per-token rate doubles — the whole request reprices at the long-context rate, not just the overflow portion.

Hey, I’m Hanks — a workflow tinkerer and AI tool obsessive with over a decade of hands-on experience in automation, SaaS, and content creation. I spend my days testing tools so you don’t have to, breaking down complex processes into simple, actionable steps, and digging into the numbers behind “what actually works.”

Apply to become Macaron's first friends