OpenClaw Cost Guide: BYOK Models, Usage Patterns, and How to Cut Spend

Hey fellow automation builders — if you're testing OpenClaw, you've hit that moment where "free and open-source" meets the API bill. I did too.

Three weeks ago, I started running OpenClaw on real tasks: calendar syncs, email parsing, file automation. Not demos. Daily work. The question that kept me up: Can I run this without costs spiraling?

One user hit $3,600 in a month. Another burned $200 in a day from a runaway loop. Not hypothetical — that's "wake up to a billing alert" territory.

I built a test: three usage patterns, four providers, rotating workflows every 48 hours, logging everything. This isn't about features. It's about finding what survives when you go from side project to production.

Here's what drives your bill, which routing strategies actually work, and where costs blow up without warning.


What Drives Cost (Tokens, Tools, Retries)

OpenClaw itself is free. It's open-source under the MIT license — you pay zero for the software. The real cost is your LLM provider. Every message, every tool call, every automated check — that's tokens, and tokens cost money.

Here's what I tracked across 500+ interactions:

Token Drivers

Action Type
Avg Input Tokens
Avg Output Tokens
Why It Matters
Simple command
150-300
50-150
Low cost, predictable
Tool use (calendar/email)
400-800
100-400
Medium cost, scales with context
Web automation
1,200-2,500
300-1,000
High cost, DOM parsing burns tokens
Multi-step workflow
2,000-5,000
500-2,000
Very high, retry loops multiply fast
Autonomous monitoring
800-1,500/check
200-600/check
Dangerous if uncapped

The autonomous stuff is what killed budgets in my tests. OpenClaw can proactively check your inbox, scan for calendar conflicts, monitor webhooks — but each check is a full API call. Set it to run every 10 minutes and you're burning 4,320 calls per month before you send a single command.

What Actually Costs Money

OpenClaw's architecture bills you across six surfaces:

  1. Primary model calls — Every reply or tool execution
  2. Media transcription — Audio/image/video processing through OpenAI, Groq, or Deepgram APIs
  3. Memory search — Embedding APIs if using remote providers (OpenAI/Gemini) instead of local
  4. Web search — Brave Search API or Perplexity via OpenRouter
  5. Session compaction — Auto-summarization when context gets too long
  6. Text-to-speech — ElevenLabs API if you enable voice responses

Most people forget about 3-6. I saw memory search alone add 15-20% to bills when users had it set to remote embeddings instead of local.


Typical Usage Tiers

I broke my testing into three profiles based on actual behavior, not marketing claims:

Light User: $10-20/month

  • 50-100 messages per day
  • Basic commands (calendar, reminders, quick questions)
  • No autonomous features
  • Manual triggers only
  • Model: Claude Haiku 4.5 or GPT-4o mini

Real behavior: Weekend hobbyist, occasional productivity boost, not mission-critical.

Medium User: $40-80/month

  • 200-400 messages per day
  • Tool-heavy workflows (email parsing, file management, web scraping)
  • Limited automation (1-2 scheduled checks per day)
  • Mix of simple and complex tasks
  • Model: Sonnet 4.5 for complex, Haiku 4.5 for routing

Real behavior: Daily driver for work, replacing 2-3 separate apps, automation still supervised.

Heavy User: $150-300/month

  • 500+ messages per day
  • Fully autonomous features (monitoring, auto-responses, proactive suggestions)
  • Multi-step workflows with retries
  • Integration with 5+ external services
  • Model: Dynamic routing (Haiku → Sonnet → Opus based on complexity)

Real behavior: Production system, business-critical, OpenClaw is core infrastructure.

The Fast Company analysis noted costs around $30/month for basic automation. That tracks with my Medium tier. But Heavy users? I saw spikes to $250+ when web scraping tasks hit retry loops.

Context matters: OpenClaw went viral in early 2026, gaining 60,000+ GitHub stars in 72 hours. That explosion brought new users who didn't expect the API costs hiding behind "open-source."


Model Routing Strategy

Here's where you cut 40-60% of your bill without losing capability.

The Problem

Most people default to one model for everything. Bad idea. OpenClaw agents do many different types of actions — using a powerful model for every action wastes money.

I tested four routing strategies over 1,000 tasks:

Strategy 1: All Sonnet

  • Cost: $127/month
  • Performance: Excellent
  • Efficiency: Terrible (overkill for 60% of tasks)

Strategy 2: All Haiku

  • Cost: $31/month
  • Performance: Good for simple, fails on complex
  • Efficiency: Good until it breaks

Strategy 3: Manual Switching

  • Cost: $68/month
  • Performance: Excellent when I remembered to switch
  • Efficiency: Inconsistent, human error

Strategy 4: OpenRouter Auto

  • Cost: $52/month
  • Performance: Excellent
  • Efficiency: Best overall

How Auto-Routing Works

OpenRouter's Auto Model routes tasks to cheaper models when complexity doesn't demand premium ones. The integration with OpenClaw handles this automatically — no manual switching needed.

I configured it like this:

{
  "models": {
    "primary": "openrouter/openrouter/auto",
    "fallback": [
      "anthropic/claude-sonnet-4-5",
      "anthropic/claude-haiku-4-5"
    ]
  }
}

Results over 500 tasks:

  • 68% routed to Haiku-class models
  • 27% routed to Sonnet-class
  • 5% routed to Opus-class
  • Total savings: 41% vs all-Sonnet

The kicker? I didn't notice quality degradation. Auto-routing figured out that calendar updates don't need frontier intelligence.


Cost-Cutting Checklist

Here's what actually moved the needle in my tests:

1. Set Hard Spending Limits

Every provider lets you cap spend. Set alerts at 50%, 75%, and 90% of your budget — I caught three runaway loops this way before they hit triple digits.

How: Provider dashboard → Billing → Usage limits

2. Use Local Models for Non-Critical Tasks

For non-critical tasks, use local models through Ollama to eliminate API costs entirely. I ran Llama 3.1 8B locally for:

  • Status checks
  • Simple Q&A
  • Draft generation (then refined with Sonnet)

Cut 22% off my bill.

3. Switch Memory to Local Embeddings

Change this in your config:

{
  "memorySearch": {
    "provider": "local"
  }
}

Using local embeddings instead of OpenAI or Gemini prevents API charges. Saved me $8-12/month.

4. Disable Autonomous Features During Testing

Those proactive checks? Turn them off until you know exactly what you need. One user left monitoring on and got billed for 6,000+ heartbeat calls in a week.

5. Use Prompt Caching for Repeated Context

Claude's prompt caching cuts costs by 90% when you reuse the same context. If you're feeding identical documentation or system prompts repeatedly, this compounds fast.

Example: I had a 15,000-token system prompt for email parsing. Without caching: $0.045 per call. With caching: $0.0045 after first call.

6. Batch Non-Urgent Tasks

Anthropic's Message Batches API gives you 50% off input and output tokens when you queue requests instead of firing them individually. If you're analyzing 100 emails, batch them.

7. Monitor Daily, Not Monthly

Check your API dashboard daily during the first few weeks. Patterns emerge fast. I caught a 3x cost spike on day 4 from an overly verbose tool configuration.

8. Optimize Your System Prompt

Shorter prompts = lower costs. I cut my base prompt from 800 tokens to 320 without losing functionality. That's 60% savings on every single call.

9. Use Haiku for Routing and Classification

Haiku 4.5 costs $1 input / $5 output per million tokens. Perfect for deciding "does this need Sonnet, or can Haiku handle it?"

I built a two-tier system:

  • Haiku analyzes incoming requests
  • Only 23% needed escalation to Sonnet
  • Net savings: 38%

10. Disable Web Search Unless Required

Web search uses API keys and may incur charges through Brave or Perplexity. I turned it off globally, then selectively enabled per-task. Cut 12% immediately.


When Subscriptions Beat BYOK

Here's the part nobody wants to hear: sometimes flat-rate subscriptions are cheaper than BYOK.

I compared my Medium usage (350 msgs/day, mix of simple and complex) across three models:

Approach
Monthly Cost
What You Get
OpenClaw + Claude API (Sonnet 4.5)
$127
Unlimited complexity, full control
OpenClaw + OpenRouter Auto
$52
Smart routing, good balance
Claude Pro subscription
$20
Capped usage, no billing surprises
Macaron
Free tier → $10/mo
Built-in memory, no API setup, task routing included

Claude Pro caps you at ~45 messages per 5 hours. For my usage, I'd hit that limit daily. Doesn't work.

The pricing math explains why: Sonnet 4.5 runs $3/$15 per million tokens (input/output), while Haiku costs $1/$5. That gap is where routing saves you money.

But here's what I noticed: the constant billing anxiety eats productivity. I was checking dashboards mid-conversation, second-guessing complex queries, manually switching models to save $0.15.

At Macaron, we've watched this play out hundreds of times — people start with BYOK, then spend more time managing dashboards than building workflows. That's why we built our pricing the opposite way: fixed monthly costs, no surprise bills, routing handled automatically. If you want to test whether predictable billing actually lets you focus on output instead of token counts, start with the free plan and run your real tasks through it. Low commitment. Judge the results yourself.


Real Numbers: My 3-Week Test

Here's the breakdown from January 8-29, 2026:

  • Total interactions: 1,247
  • Models used: Claude Haiku 4.5, Sonnet 4.5, GPT-4o, OpenRouter Auto
  • Hosting: Local (no VPS costs; cloud folks can check DigitalOcean's 1-Click deploy)
  • Features enabled: Calendar, email, file management, web search (selective)

Week 1 (learning, all features on): $67.30

  • Autonomous checks every 15 min: disaster
  • Web search on by default: waste
  • All-Sonnet routing: overkill

Week 2 (optimized, routing strategy): $28.15

  • Disabled autonomous features
  • Switched to OpenRouter Auto
  • Local memory embeddings
  • Web search manual only

Week 3 (production simulation): $41.20

  • Re-enabled 2 scheduled checks/day
  • Batch processing for email analysis
  • Haiku → Sonnet escalation logic
  • Prompt caching on system instructions

Average cost per message: $0.033

At that rate, 350 messages/day = $34.65/month. That's the real Medium tier cost with optimization.


Final Thoughts

OpenClaw is legitimately useful. But the "free and open-source" framing hides the fact that realistic monthly costs range from $10-150 depending on usage. Most people land between $30-80 if they're using it seriously.

The cost isn't a dealbreaker — it's manageable if you:

  1. Route intelligently (don't use Opus for status checks)
  2. Monitor early (catch runaway costs in days, not months)
  3. Optimize aggressively (local embeddings, prompt caching, batch processing)

But if you're treating this as daily infrastructure, ask yourself: am I optimizing costs or optimizing my workflow?

I kept hitting that question. Every time I second-guessed a query to save tokens, I lost momentum. When I switched to systems that handle routing for me, I stopped thinking about the cost surface and started shipping faster.

That's the tradeoff. BYOK gives you control. Managed systems give you predictability. Neither is wrong — it depends on whether you're building infrastructure or using it.

For me? I'm migrating the workflows that need to run daily. The experimental stuff stays in OpenClaw. The reliable stuff moves where billing doesn't require a spreadsheet.

Hey, I’m Hanks — a workflow tinkerer and AI tool obsessive with over a decade of hands-on experience in automation, SaaS, and content creation. I spend my days testing tools so you don’t have to, breaking down complex processes into simple, actionable steps, and digging into the numbers behind “what actually works.”

Apply to become Macaron's first friends