Managing Token Usage & Costs in OpenClaw

⚠️ The Problem

Users report consuming 1-3 million tokens within minutes of normal use, hitting API quotas instantly, receiving rate limit errors like 'You exceeded your current quota for generate_content_paid_tier_inp', or sessions growing to multi-megabyte sizes causing exponential token growth on every message.

🔍 Why This Happens

Several factors cause runaway token consumption:

Gemini 2.5 Pro's token-hungry architecture — This model is documented to consume 1.9M+ input tokens in just a few dozen API calls, even for simple tasks. This is a known model behavior, not an OpenClaw bug.
Thinking/reasoning mode enabled — When thinking: { type: "enabled" } is set, models generate extensive internal reasoning chains that dramatically increase token usage per request.
Context compounding bug — Large tool outputs (especially gateway config.schema returning 396KB+ of JSON) get permanently stored in session .jsonl files. Once this happens, every subsequent message drags that entire blob forward, causing exponential growth. Sessions with ~35 messages have grown to 2.9MB.
Discord summarization pulling too many messages — The readMessages tool can fetch up to Discord's API limit (50-100 messages) per call, each adding to context.
No context token limits configured — Without explicit limits, the agent will use the full context window of your model, reloading everything on each request.

✅ The Fix

Step 1: Check Current Token Usage

First, identify if you're actually hitting limits and where:

For Google/Gemini:

bash

# Check real-time usage statsopen "https://aistudio.google.com/usage?timeRange=last-28-days&tab=rate-limit"

For Anthropic:

bash

# Check your Claude usageopen "https://console.anthropic.com/settings/usage"

Step 2: Switch to a More Efficient Model

Gemini 2.5 Pro is extremely token-hungry. Switching to Flash can reduce consumption by 10x:

json5

// In ~/.config/openclaw/config.json5{  "models": {    "provider": "google",    "model": "google/gemini-2.5-flash-preview"  // Much cheaper than Pro  }}

Or via CLI:

bash

openclaw model set google/gemini-2.5-flash-preview

Step 3: Disable Thinking Mode

Thinking/reasoning mode can explode token usage by 10-50x. Ensure it's disabled:

json5

// In ~/.config/openclaw/config.json5{  "agents": {    "defaults": {      "models": {        "google/gemini-2.5-flash-preview": {          "params": {            "thinking": { "type": "disabled" }  // Critical!          }        },        "google/gemini-2.5-pro-preview": {          "params": {            "thinking": { "type": "disabled" }          }        }      }    }  }}

Step 4: Set Context Token Limits

Prevent over-fetching by limiting the context window:

json5

// In ~/.config/openclaw/config.json5{  "agents": {    "defaults": {      "contextTokens": 50000  // Limit to 50k tokens    }  }}

Step 5: Fix Bloated Sessions (Context Compounding Bug)

If your sessions have already grown massive, you need to clean them:

bash

# List your sessions and their sizesls -lah ~/.openclaw/agents/main/sessions/# Check for bloated sessions (anything over 500KB is suspicious)du -h ~/.openclaw/agents/main/sessions/*.jsonl | sort -h

If you find multi-megabyte sessions:

bash

# Backup the bloated sessionmv ~/.openclaw/agents/main/sessions/SESSION_ID.jsonl ~/.openclaw/agents/main/sessions/SESSION_ID.jsonl.bak# Start freshopenclaw session new

Step 6: Use Subagents for Heavy Tasks

For token-intensive operations, spawn subagents with isolated context:

bash

/spawn Summarize Discord messages from the last 24 hours/spawn Configure Samba with these specs: [detailed specs]

Benefits:

Isolated context (only loads AGENTS.md + TOOLS.md, not full chat history)
Can use cheaper models for subtasks
Results announce back to your main chat

Step 7: Request Higher Quotas (Google)

If you legitimately need higher limits:

bash

# Check your current tieropen "https://aistudio.google.com/app/apikey"# Request a rate limit increase (no guarantee, but they review)open "https://forms.gle/ETzX94k8jf7iSotH9"

Tier levels:

Tier 1 (default): ~1M tokens/min limit
Tier 2 (requires >$250 spend): Higher limits
Tier 3 (enterprise): Highest limits

🔥 Your AI should run your business, not just answer questions.

We'll show you how.$97/mo (going to $197 soon)

Join Vibe Combinator →

📋 Quick Commands

Command	Description
openclaw model set google/gemini-2.5-flash-preview	Switch to the more efficient Gemini Flash model
openclaw model list	Check current model configuration
openclaw session new	Start a fresh session with clean context
ls -lah ~/.openclaw/agents/main/sessions/	List session files and their sizes to find bloated ones
du -h ~/.openclaw/agents/main/sessions/*.jsonl \| sort -h	Sort sessions by size to identify the largest
/status	Check current session status and token usage
/reasoning off	Disable reasoning/thinking mode in the current session

Related Issues

Rate Limits & Quota Management — Avoid Downtime

Managing Token Usage & Costs in OpenClaw

⚠️ The Problem

🔍 Why This Happens

✅ The Fix

Step 1: Check Current Token Usage

Step 2: Switch to a More Efficient Model

Step 3: Disable Thinking Mode

Step 4: Set Context Token Limits

Step 5: Fix Bloated Sessions (Context Compounding Bug)

Step 6: Use Subagents for Heavy Tasks

Step 7: Request Higher Quotas (Google)

📋 Quick Commands

Related Issues

📚 You Might Also Like

OpenClaw Pricing: How Much Does It Cost? (2026 Guide)

OpenClaw Configuration Guide: Complete Settings Reference (2026)

1Password

AI Assistant for Content Creators