
Hey fellow AI tinkerers — if you've ever watched your token bill climb at the end of the month and thought "there has to be a better way," I built the setup for you. I'm Hanks, and I've been stress-testing AI automation tools inside real workflows for years. Running OpenClaw + Ollama together is one of those setups that sounds complicated until you actually do it — and then you wonder why you didn't do it sooner.
No API key. No cloud. No surprise invoices. Here's exactly how to get it running in 2026.

This is the one that actually matters for most people reading this. When your OpenClaw agent handles files, reads documents, scrapes pages, or interacts with sensitive tools — every token sent to a cloud API is data that left your machine. With Ollama running locally, the entire stack — prompts, context, tool calls, outputs — stays on your hardware. No third-party logs. No data residency concerns.
OpenClaw's local Ollama integration means all datasets, documents, and intermediate outputs stay on-device with nothing transmitted to external services, and the system works without internet access once the model is pulled. For anyone handling anything sensitive — financial documents, client data, internal reports — that's not a nice-to-have. It's a requirement.
Ollama is free and runs locally, so all model costs in OpenClaw are set to $0. The only real cost is hardware and electricity — both of which scale predictably, unlike per-token API pricing that can spike without warning when your agent goes deep on a task.
For heavy agentic workloads — the kind where a single task might chain dozens of tool calls and context-heavy reasoning turns — local inference is almost always cheaper at volume than frontier API pricing.
Let me be direct here: local models are not frontier models. There's a real capability gap.
The honest take: for agent tasks involving file operations, code editing, summarization, and multi-step workflows, a well-chosen local model handles 80%+ of the job well. For complex reasoning, nuanced judgment, or tasks that require the absolute best output quality, cloud models still win. The smart move is running Ollama as your primary and keeping a cloud fallback configured for edge cases.

OpenClaw requires a larger context length to complete tasks. It is recommended to use a context length of at least 64k tokens. That constraint matters for hardware choices — not all models can hit 64K context at a usable speed without sufficient VRAM.
CPU-only inference works — Ollama supports it — but plan for responses that are 5–10× slower than GPU inference. For real agent workflows with tool calls, slow inference compounds fast. If you're on CPU only, stick to sub-7B models or accept the wait.
If you haven't installed Ollama yet, the official install is straightforward:
macOS / Linux:
curl -fsSL https://ollama.ai/install.sh | sh
Windows: Download the installer from ollama.ai — no command-line install needed.
Verify the install:
ollama --version
Start the server if it's not already running:
ollama serve
By default Ollama runs at http://127.0.0.1:11434. That's what OpenClaw points at.
If you haven't installed OpenClaw yet:
# macOS / Linux:
curl -fsSL https://openclaw.ai/install.sh | bash
# Windows (PowerShell):
iwr -useb https://openclaw.ai/install.ps1 | iex
# Or via npm / pnpm:
npm install -g openclaw@latest
Confirm it's working:
openclaw -v
openclaw doctor
openclaw doctor will surface any config issues before you dig into the Ollama setup. Run it now so you're not troubleshooting two things at once later.

The Ollama blog recommends these models for use with OpenClaw, requiring at least 64k token context length:
February 2026 community consensus (from the active OpenClaw discussions thread): qwen3-coder and glm-4.7-flash are the current sweet spots for agentic tool-calling tasks on mid-range hardware. gpt-oss:20b is the most-referenced model for "just works" setups.
This is where most people miscalibrate. Tiny models (1.5B–4B) are fine for conversation but fall apart on agentic tasks — they lose track of tool call results mid-chain, fail to follow structured output formats, and hallucinate tool names. For real OpenClaw agent use:
ollama pull# Recommended starting point — balanced performance:
ollama pull gpt-oss:20b
# Speed-focused — great if VRAM is limited:
ollama pull glm-4.7-flash
# Coding-focused:
ollama pull qwen3-coder
# Verify what you have installed:
ollama list
First pull takes time — gpt-oss:20b is roughly 12–14 GB depending on quantization. Plan for that.
This is where the actual work happens. There are two paths: implicit auto-discovery (simpler) and explicit manual config (more control). I'll cover both.
When you set OLLAMA_API_KEY and do not define an explicit models.providers.ollama entry, OpenClaw auto-discovers models from the local Ollama instance at http://127.0.0.1:11434 — querying /api/tags and /api/show, keeping only models that report tool capability, and setting all costs to $0.
# Set this in your shell (or add to ~/.bashrc / ~/.zshrc):
export OLLAMA_API_KEY="ollama-local"
# Or set it via the CLI:
openclaw config set models.providers.ollama.apiKey "ollama-local"
Any value works — Ollama doesn't validate the key. "ollama-local" is the convention you'll see everywhere, so use that.
After setting the env var, confirm OpenClaw sees your models:
ollama list # what Ollama has
openclaw models list # what OpenClaw discovered
Then set your primary model in ~/.openclaw/openclaw.json:
{
"agents": {
"defaults": {
"model": {
"primary": "ollama/gpt-oss:20b",
"fallbacks": ["ollama/glm-4.7-flash", "ollama/llama3.3"]
}
}
}
}
Restart the gateway after any config change:
# systemd (Linux):
systemctl --user restart openclaw
# launchd (macOS):
launchctl kickstart -k gui/$(id -u)/openclaw
# Or use the CLI:
openclaw gateway restart
Use explicit config when Ollama runs on a different machine, or when you want to force specific context windows that Ollama doesn't report correctly:
{
"models": {
"providers": {
"ollama": {
"baseUrl": "http://127.0.0.1:11434",
"apiKey": "ollama-local",
"api": "ollama",
"models": [
{
"id": "gpt-oss:20b",
"name": "GPT-OSS 20B",
"reasoning": false,
"input": ["text"],
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
"contextWindow": 131072,
"maxTokens": 8192
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "ollama/gpt-oss:20b",
"fallbacks": ["ollama/glm-4.7-flash"]
}
}
}
}
Note on API mode: OpenClaw uses the native Ollama API (/api/chat) by default, which fully supports streaming and tool calling simultaneously. If you need the OpenAI-compatible endpoint (e.g., behind a proxy), set api: "openai-completions" explicitly — but note that endpoint may not support streaming and tool calling at the same time, and you may need to add params: { streaming: false } to your model config.
Stick with "api": "ollama" unless you have a specific reason to use the OpenAI-compatible path.
The model ID in OpenClaw config must exactly match the Ollama model name — including the tag:
# If ollama list shows:
gpt-oss:20b
# Your config must use:
"primary": "ollama/gpt-oss:20b" ✅
# Not:
"primary": "ollama/gpt-oss" ❌ (missing tag)
"primary": "gpt-oss:20b" ❌ (missing provider prefix)
After saving config and restarting the gateway, send a test through whatever channel you have connected:
/agent What model are you running on?
Or via the ollama launch openclaw shortcut if you want a quick validation without setting up a messaging channel:
ollama launch openclaw --config
This command configures OpenClaw without immediately starting the service, and the gateway auto-reloads if it's already running.
For auto-discovered models, OpenClaw uses the context window reported by Ollama when available, otherwise defaulting to 8192. You can override contextWindow and maxTokens in explicit provider config.
The 8192 default is too low for real agent tasks. Override it explicitly for any model you're using seriously:
{
"contextWindow": 131072,
"maxTokens": 8192
}
maxTokens controls the output budget per turn. contextWindow controls how much history + context the model can see. For agent tasks, a larger contextWindow matters more than a larger maxTokens.
Also configure compaction to manage context intelligently across long sessions:
{
"agents": {
"defaults": {
"compaction": {
"mode": "safeguard"
}
}
}
}
safeguard mode compacts the context before it overflows rather than failing mid-task. For local models with smaller effective context, this is worth enabling.
Local models are often more sensitive to system prompt quality than frontier models — a vague system prompt produces noticeably worse agentic behavior. Keep system prompts explicit about what the agent should and shouldn't do.
For tool-calling tasks, lower temperature helps:
{
"models": {
"providers": {
"ollama": {
"params": {
"temperature": 0.2
}
}
}
}
}
Higher temperature (0.7–1.0) is fine for creative or conversational tasks but introduces errors in structured tool-call output — the model starts inventing argument names or misformatting JSON. For agentic work, stay between 0.1 and 0.4.
"params": { "streaming": false }Run openclaw doctor and ollama ps together when something breaks — most connectivity issues are visible in those two outputs.
The local setup eliminates token costs entirely — but the calculation gets interesting when you compare it against specific cloud API pricing across different task types and workload volumes. If you want to see the full breakdown of what OpenClaw + Ollama saves vs. running frontier models through the API at scale, our Cost Guide puts real numbers behind it.
And if you want the agent capability without managing any of this infrastructure — no daemon, no gateway config, no Ollama server to keep running — Macaron is a personal AI agent that works across your devices without a self-hosted stack. Try it free and run it against your own tasks.
Q: Do I need an API key to use OpenClaw with Ollama? No. Ollama doesn't require a real API key. Set OLLAMA_API_KEY="ollama-local" — any string works. This just signals to OpenClaw that Ollama is enabled and should be auto-discovered.
Q: Can I use both Ollama and a cloud model in the same OpenClaw setup? Yes, and this is the recommended pattern. Set Ollama as your primary model and add cloud models (Anthropic, OpenAI, etc.) as fallbacks. OpenClaw will route to fallbacks automatically if the local model fails or hits context limits. See model failover docs for config details.
Q: Which Ollama model is best for OpenClaw in early 2026? As of February 2026, gpt-oss:20b is the most widely tested and referenced model in the OpenClaw community for general agent tasks. qwen3-coder is the top pick for code-heavy workflows. glm-4.7-flash wins on speed-to-quality ratio for constrained hardware.
Q: Does Ollama work with OpenClaw on Windows? Yes. Ollama has a Windows installer. OpenClaw on Windows requires WSL2 — run both inside WSL2 for the most reliable experience. The gateway daemon installs as a systemd service inside WSL2.
Q: What's the ollama launch openclaw command I keep seeing? This is a shortcut published in Ollama's official February 2026 blog post — it launches OpenClaw directly with Ollama, connecting local or cloud models in one step. Use ollama launch openclaw --config to configure without immediately starting the service.
Q: My model doesn't appear in openclaw models list even though it's in ollama list. Why? Auto-discovery only keeps models that report tools capability via Ollama's /api/show endpoint. If a model doesn't declare tool support, OpenClaw skips it. Fix: use explicit config to manually declare the model, or switch to a model that supports tool calling (most of the recommended models above do).
Q: Should I worry about prompt injection with local models? Yes, actually — local models are sometimes more vulnerable to prompt injection in web content and file inputs than frontier models, because they lack the RLHF hardening that cloud models receive. The community discussion thread has a working advanced prompt injection defense gist worth reviewing.