Managing Token Usage & Costs in OpenClaw
Burning through millions of tokens unexpectedly? Learn how to diagnose runaway token consumption, configure context limits, and avoid bill shock with Gemini and other models.
⚠️ The Problem
Users report consuming 1-3 million tokens within minutes of normal use, hitting API quotas instantly, receiving rate limit errors like 'You exceeded your current quota for generate_content_paid_tier_inp', or sessions growing to multi-megabyte sizes causing exponential token growth on every message.
🔍 Why This Happens
Several factors cause runaway token consumption:
-
Gemini 2.5 Pro's token-hungry architecture — This model is documented to consume 1.9M+ input tokens in just a few dozen API calls, even for simple tasks. This is a known model behavior, not an OpenClaw bug.
-
Thinking/reasoning mode enabled — When
thinking: { type: "enabled" }is set, models generate extensive internal reasoning chains that dramatically increase token usage per request. -
Context compounding bug — Large tool outputs (especially
gateway config.schemareturning 396KB+ of JSON) get permanently stored in session.jsonlfiles. Once this happens, every subsequent message drags that entire blob forward, causing exponential growth. Sessions with ~35 messages have grown to 2.9MB. -
Discord summarization pulling too many messages — The
readMessagestool can fetch up to Discord's API limit (50-100 messages) per call, each adding to context. -
No context token limits configured — Without explicit limits, the agent will use the full context window of your model, reloading everything on each request.
✅ The Fix
Step 1: Check Current Token Usage
First, identify if you're actually hitting limits and where:
For Google/Gemini:
# Check real-time usage statsopen "https://aistudio.google.com/usage?timeRange=last-28-days&tab=rate-limit"For Anthropic:
# Check your Claude usageopen "https://console.anthropic.com/settings/usage"Step 2: Switch to a More Efficient Model
Gemini 2.5 Pro is extremely token-hungry. Switching to Flash can reduce consumption by 10x:
// In ~/.config/openclaw/config.json5{ "models": { "provider": "google", "model": "google/gemini-2.5-flash-preview" // Much cheaper than Pro }}Or via CLI:
openclaw model set google/gemini-2.5-flash-previewStep 3: Disable Thinking Mode
Thinking/reasoning mode can explode token usage by 10-50x. Ensure it's disabled:
// In ~/.config/openclaw/config.json5{ "agents": { "defaults": { "models": { "google/gemini-2.5-flash-preview": { "params": { "thinking": { "type": "disabled" } // Critical! } }, "google/gemini-2.5-pro-preview": { "params": { "thinking": { "type": "disabled" } } } } } }}Step 4: Set Context Token Limits
Prevent over-fetching by limiting the context window:
// In ~/.config/openclaw/config.json5{ "agents": { "defaults": { "contextTokens": 50000 // Limit to 50k tokens } }}Step 5: Fix Bloated Sessions (Context Compounding Bug)
If your sessions have already grown massive, you need to clean them:
# List your sessions and their sizesls -lah ~/.openclaw/agents/main/sessions/# Check for bloated sessions (anything over 500KB is suspicious)du -h ~/.openclaw/agents/main/sessions/*.jsonl | sort -hIf you find multi-megabyte sessions:
# Backup the bloated sessionmv ~/.openclaw/agents/main/sessions/SESSION_ID.jsonl ~/.openclaw/agents/main/sessions/SESSION_ID.jsonl.bak# Start freshopenclaw session newStep 6: Use Subagents for Heavy Tasks
For token-intensive operations, spawn subagents with isolated context:
/spawn Summarize Discord messages from the last 24 hours/spawn Configure Samba with these specs: [detailed specs]Benefits:
- Isolated context (only loads AGENTS.md + TOOLS.md, not full chat history)
- Can use cheaper models for subtasks
- Results announce back to your main chat
Step 7: Request Higher Quotas (Google)
If you legitimately need higher limits:
# Check your current tieropen "https://aistudio.google.com/app/apikey"# Request a rate limit increase (no guarantee, but they review)open "https://forms.gle/ETzX94k8jf7iSotH9"Tier levels:
- Tier 1 (default): ~1M tokens/min limit
- Tier 2 (requires >$250 spend): Higher limits
- Tier 3 (enterprise): Highest limits
🔥 Your AI should run your business, not just answer questions.
We'll show you how.$97/mo (going to $197 soon)
📋 Quick Commands
| Command | Description |
|---|---|
| openclaw model set google/gemini-2.5-flash-preview | Switch to the more efficient Gemini Flash model |
| openclaw model list | Check current model configuration |
| openclaw session new | Start a fresh session with clean context |
| ls -lah ~/.openclaw/agents/main/sessions/ | List session files and their sizes to find bloated ones |
| du -h ~/.openclaw/agents/main/sessions/*.jsonl | sort -h | Sort sessions by size to identify the largest |
| /status | Check current session status and token usage |
| /reasoning off | Disable reasoning/thinking mode in the current session |
Related Issues
📚 You Might Also Like
OpenClaw Pricing: How Much Does It Cost? (2026 Guide)
OpenClaw is free, but running it costs money. Complete breakdown of AI API costs, hosting, and how to reduce spending.
OpenClaw Configuration Guide: Complete Settings Reference (2026)
Master OpenClaw configuration with this complete reference. All config.yaml settings explained: AI models, channels, multi-agent setup, plugins, secrets management, and more.
1Password
Secure secrets management through conversation. Access passwords, API keys, and secure notes safely.
AI Assistant for Content Creators
Create more, manage less
🐙 Your AI should run your business.
Weekly live builds + template vault. We'll show you how to make AI actually work.$97/mo (going to $197 soon)
Join Vibe Combinator →