OpenClaw + Ollama: Run a Fully Local AI Agent (No API Key, No Cloud) in 2026

Hey fellow AI tinkerers — if you've ever watched your token bill climb at the end of the month and thought "there has to be a better way," I built the setup for you. I'm Hanks, and I've been stress-testing AI automation tools inside real workflows for years. Running OpenClaw + Ollama together is one of those setups that sounds complicated until you actually do it — and then you wonder why you didn't do it sooner.

No API key. No cloud. No surprise invoices. Here's exactly how to get it running in 2026.


Why Run OpenClaw with Ollama?

Privacy: Your Data Never Leaves Your Machine

This is the one that actually matters for most people reading this. When your OpenClaw agent handles files, reads documents, scrapes pages, or interacts with sensitive tools — every token sent to a cloud API is data that left your machine. With Ollama running locally, the entire stack — prompts, context, tool calls, outputs — stays on your hardware. No third-party logs. No data residency concerns.

OpenClaw's local Ollama integration means all datasets, documents, and intermediate outputs stay on-device with nothing transmitted to external services, and the system works without internet access once the model is pulled. For anyone handling anything sensitive — financial documents, client data, internal reports — that's not a nice-to-have. It's a requirement.

Cost: Zero Token Fees After Setup

Ollama is free and runs locally, so all model costs in OpenClaw are set to $0. The only real cost is hardware and electricity — both of which scale predictably, unlike per-token API pricing that can spike without warning when your agent goes deep on a task.

For heavy agentic workloads — the kind where a single task might chain dozens of tool calls and context-heavy reasoning turns — local inference is almost always cheaper at volume than frontier API pricing.

Trade-offs vs. Cloud Models (Latency, Capability)

Let me be direct here: local models are not frontier models. There's a real capability gap.

Factor
Ollama (Local)
Cloud API
Cost per token
$0
Varies by model/provider
Privacy
Complete — data stays on-device
Data leaves your machine
Response speed
Depends on hardware
Generally fast, consistent
Model capability
Good, not frontier-level
State-of-the-art available
Tool calling
Strong with recommended models
Excellent across top models
Offline operation
Yes
No
Context window
8K–131K depending on model
Up to 1M+ (Claude, Gemini)

The honest take: for agent tasks involving file operations, code editing, summarization, and multi-step workflows, a well-chosen local model handles 80%+ of the job well. For complex reasoning, nuanced judgment, or tasks that require the absolute best output quality, cloud models still win. The smart move is running Ollama as your primary and keeping a cloud fallback configured for edge cases.


Prerequisites

Hardware Minimums (RAM, GPU VRAM Guide)

OpenClaw requires a larger context length to complete tasks. It is recommended to use a context length of at least 64k tokens. That constraint matters for hardware choices — not all models can hit 64K context at a usable speed without sufficient VRAM.

Model Size
Min RAM (CPU-only)
Recommended GPU VRAM
Expected Speed
3B–4B params
8 GB
None required
Slow but workable
7B–8B params
16 GB
8 GB VRAM
Reasonable for chat/simple tasks
14B params
32 GB
16 GB VRAM
Good for agent tasks
20B params
48 GB
24 GB VRAM
Solid agentic performance
32B params
64 GB
32–48 GB VRAM
Strong capability
70B+ params
80 GB+
48–80 GB VRAM
Best local quality

CPU-only inference works — Ollama supports it — but plan for responses that are 5–10× slower than GPU inference. For real agent workflows with tool calls, slow inference compounds fast. If you're on CPU only, stick to sub-7B models or accept the wait.

Ollama Installation (macOS / Linux / Windows)

If you haven't installed Ollama yet, the official install is straightforward:

macOS / Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Windows: Download the installer from ollama.ai — no command-line install needed.

Verify the install:

ollama --version

Start the server if it's not already running:

ollama serve

By default Ollama runs at http://127.0.0.1:11434. That's what OpenClaw points at.

OpenClaw Installation Check

If you haven't installed OpenClaw yet:

# macOS / Linux:
curl -fsSL https://openclaw.ai/install.sh | bash
# Windows (PowerShell):
iwr -useb https://openclaw.ai/install.ps1 | iex
# Or via npm / pnpm:
npm install -g openclaw@latest

Confirm it's working:

openclaw -v
openclaw doctor

openclaw doctor will surface any config issues before you dig into the Ollama setup. Run it now so you're not troubleshooting two things at once later.


Choosing the Right Model for OpenClaw Tasks

Recommended Models (February 2026)

The Ollama blog recommends these models for use with OpenClaw, requiring at least 64k token context length:

Model
Pull Command
Best For
Context Window
qwen3-coder
ollama pull qwen3-coder
Code tasks, file editing
131K
gpt-oss:20b
ollama pull gpt-oss:20b
Balanced general agent work
131K
gpt-oss:120b
ollama pull gpt-oss:120b
High-capability tasks (needs big VRAM)
131K
glm-4.7
ollama pull glm-4.7
Strong general-purpose
128K
glm-4.7-flash
ollama pull glm-4.7-flash
Speed + quality balance
128K
llama3.3
ollama pull llama3.3
Solid fallback option
128K
deepseek-r1:32b
ollama pull deepseek-r1:32b
Reasoning-heavy tasks
64K

February 2026 community consensus (from the active OpenClaw discussions thread): qwen3-coder and glm-4.7-flash are the current sweet spots for agentic tool-calling tasks on mid-range hardware. gpt-oss:20b is the most-referenced model for "just works" setups.

Model Size vs. Task Complexity

This is where most people miscalibrate. Tiny models (1.5B–4B) are fine for conversation but fall apart on agentic tasks — they lose track of tool call results mid-chain, fail to follow structured output formats, and hallucinate tool names. For real OpenClaw agent use:

  • Simple tasks (summarize this file, reply to this message): 7B models work fine
  • Agent tasks (multi-step workflows, code editing, tool chaining): 14B minimum, 20B+ recommended
  • Complex reasoning (debugging, architecture decisions, nuanced analysis): 32B+ for best local results

Pulling Your Model with ollama pull

# Recommended starting point — balanced performance:
ollama pull gpt-oss:20b
# Speed-focused — great if VRAM is limited:
ollama pull glm-4.7-flash
# Coding-focused:
ollama pull qwen3-coder
# Verify what you have installed:
ollama list

First pull takes time — gpt-oss:20b is roughly 12–14 GB depending on quantization. Plan for that.


Configuring OpenClaw to Use Ollama

This is where the actual work happens. There are two paths: implicit auto-discovery (simpler) and explicit manual config (more control). I'll cover both.

Method 1: Auto-Discovery (Simplest — Start Here)

When you set OLLAMA_API_KEY and do not define an explicit models.providers.ollama entry, OpenClaw auto-discovers models from the local Ollama instance at http://127.0.0.1:11434 — querying /api/tags and /api/show, keeping only models that report tool capability, and setting all costs to $0.

# Set this in your shell (or add to ~/.bashrc / ~/.zshrc):
export OLLAMA_API_KEY="ollama-local"

# Or set it via the CLI:
openclaw config set models.providers.ollama.apiKey "ollama-local"

Any value works — Ollama doesn't validate the key. "ollama-local" is the convention you'll see everywhere, so use that.

After setting the env var, confirm OpenClaw sees your models:

ollama list          # what Ollama has
openclaw models list # what OpenClaw discovered

Then set your primary model in ~/.openclaw/openclaw.json:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/gpt-oss:20b",
        "fallbacks": ["ollama/glm-4.7-flash", "ollama/llama3.3"]
      }
    }
  }
}

Restart the gateway after any config change:

# systemd (Linux):
systemctl --user restart openclaw
# launchd (macOS):
launchctl kickstart -k gui/$(id -u)/openclaw
# Or use the CLI:
openclaw gateway restart

Method 2: Explicit Config (Remote Ollama / Specific Models)

Use explicit config when Ollama runs on a different machine, or when you want to force specific context windows that Ollama doesn't report correctly:

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://127.0.0.1:11434",
        "apiKey": "ollama-local",
        "api": "ollama",
        "models": [
          {
            "id": "gpt-oss:20b",
            "name": "GPT-OSS 20B",
            "reasoning": false,
            "input": ["text"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 131072,
            "maxTokens": 8192
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/gpt-oss:20b",
        "fallbacks": ["ollama/glm-4.7-flash"]
      }
    }
  }
}

Note on API mode: OpenClaw uses the native Ollama API (/api/chat) by default, which fully supports streaming and tool calling simultaneously. If you need the OpenAI-compatible endpoint (e.g., behind a proxy), set api: "openai-completions" explicitly — but note that endpoint may not support streaming and tool calling at the same time, and you may need to add params: { streaming: false } to your model config.

Stick with "api": "ollama" unless you have a specific reason to use the OpenAI-compatible path.

Setting the Model Name Correctly

The model ID in OpenClaw config must exactly match the Ollama model name — including the tag:

# If ollama list shows:
gpt-oss:20b
# Your config must use:
"primary": "ollama/gpt-oss:20b"   ✅
# Not:
"primary": "ollama/gpt-oss"       ❌ (missing tag)
"primary": "gpt-oss:20b"          ❌ (missing provider prefix)

Verifying the Connection with a Test Prompt

After saving config and restarting the gateway, send a test through whatever channel you have connected:

/agent What model are you running on?

Or via the ollama launch openclaw shortcut if you want a quick validation without setting up a messaging channel:

ollama launch openclaw --config

This command configures OpenClaw without immediately starting the service, and the gateway auto-reloads if it's already running.


Performance Tips for Local Models

Context Window Settings

For auto-discovered models, OpenClaw uses the context window reported by Ollama when available, otherwise defaulting to 8192. You can override contextWindow and maxTokens in explicit provider config.

The 8192 default is too low for real agent tasks. Override it explicitly for any model you're using seriously:

{
  "contextWindow": 131072,
  "maxTokens": 8192
}

maxTokens controls the output budget per turn. contextWindow controls how much history + context the model can see. For agent tasks, a larger contextWindow matters more than a larger maxTokens.

Also configure compaction to manage context intelligently across long sessions:

{
  "agents": {
    "defaults": {
      "compaction": {
        "mode": "safeguard"
      }
    }
  }
}

safeguard mode compacts the context before it overflows rather than failing mid-task. For local models with smaller effective context, this is worth enabling.

Adjusting Temperature and System Prompts

Local models are often more sensitive to system prompt quality than frontier models — a vague system prompt produces noticeably worse agentic behavior. Keep system prompts explicit about what the agent should and shouldn't do.

For tool-calling tasks, lower temperature helps:

{
  "models": {
    "providers": {
      "ollama": {
        "params": {
          "temperature": 0.2
        }
      }
    }
  }
}

Higher temperature (0.7–1.0) is fine for creative or conversational tasks but introduces errors in structured tool-call output — the model starts inventing argument names or misformatting JSON. For agentic work, stay between 0.1 and 0.4.


Troubleshooting: Common Ollama + OpenClaw Errors

Symptom
Cause
Fix
openclaw models list shows no Ollama models
OLLAMA_API_KEY not set, or explicit config defined
Set export OLLAMA_API_KEY="ollama-local" and confirm no models.providers.ollama block in config
Connection refused on http://127.0.0.1:11434
Ollama server not running
Run ollama serve
Model listed but not used
Wrong model name in config (missing tag or provider prefix)
Run ollama list, copy exact name including tag into config
Tool calls returning raw JSON in chat
Model lacks tool-call capability
Switch to a model that reports tools capability via ollama pull
Agent loses context mid-task
Default 8K context window too small
Override contextWindow in explicit model config
Streaming works but tool calls fail
Using OpenAI-compat API mode
Switch to "api": "ollama" (native), or add "params": { "streaming": false }
Gateway crashes frequently
Known stability issue with rapid tool-call chains
Add a cron job: */30 * * * * openclaw gateway restart
Slow inference on multi-step tasks
CPU-only inference with large model
Reduce model size, or offload to GPU-equipped machine via baseUrl

Run openclaw doctor and ollama ps together when something breaks — most connectivity issues are visible in those two outputs.


Running OpenClaw Locally? See Exactly What You're Saving

The local setup eliminates token costs entirely — but the calculation gets interesting when you compare it against specific cloud API pricing across different task types and workload volumes. If you want to see the full breakdown of what OpenClaw + Ollama saves vs. running frontier models through the API at scale, our Cost Guide puts real numbers behind it.

And if you want the agent capability without managing any of this infrastructure — no daemon, no gateway config, no Ollama server to keep running — Macaron is a personal AI agent that works across your devices without a self-hosted stack. Try it free and run it against your own tasks.


Frequently Asked Questions

Q: Do I need an API key to use OpenClaw with Ollama? No. Ollama doesn't require a real API key. Set OLLAMA_API_KEY="ollama-local" — any string works. This just signals to OpenClaw that Ollama is enabled and should be auto-discovered.

Q: Can I use both Ollama and a cloud model in the same OpenClaw setup? Yes, and this is the recommended pattern. Set Ollama as your primary model and add cloud models (Anthropic, OpenAI, etc.) as fallbacks. OpenClaw will route to fallbacks automatically if the local model fails or hits context limits. See model failover docs for config details.

Q: Which Ollama model is best for OpenClaw in early 2026? As of February 2026, gpt-oss:20b is the most widely tested and referenced model in the OpenClaw community for general agent tasks. qwen3-coder is the top pick for code-heavy workflows. glm-4.7-flash wins on speed-to-quality ratio for constrained hardware.

Q: Does Ollama work with OpenClaw on Windows? Yes. Ollama has a Windows installer. OpenClaw on Windows requires WSL2 — run both inside WSL2 for the most reliable experience. The gateway daemon installs as a systemd service inside WSL2.

Q: What's the ollama launch openclaw command I keep seeing? This is a shortcut published in Ollama's official February 2026 blog post — it launches OpenClaw directly with Ollama, connecting local or cloud models in one step. Use ollama launch openclaw --config to configure without immediately starting the service.

Q: My model doesn't appear in openclaw models list even though it's in ollama list. Why? Auto-discovery only keeps models that report tools capability via Ollama's /api/show endpoint. If a model doesn't declare tool support, OpenClaw skips it. Fix: use explicit config to manually declare the model, or switch to a model that supports tool calling (most of the recommended models above do).

Q: Should I worry about prompt injection with local models? Yes, actually — local models are sometimes more vulnerable to prompt injection in web content and file inputs than frontier models, because they lack the RLHF hardening that cloud models receive. The community discussion thread has a working advanced prompt injection defense gist worth reviewing.

Hey, I’m Hanks — a workflow tinkerer and AI tool obsessive with over a decade of hands-on experience in automation, SaaS, and content creation. I spend my days testing tools so you don’t have to, breaking down complex processes into simple, actionable steps, and digging into the numbers behind “what actually works.”

Apply to become Macaron's first friends