OpenClaw keeps context across sessions, tools, and workspace files, so token usage can grow faster than in a one-off chat. Below are practical ways to keep costs predictable without giving up what you actually need from the agent.
We can use the following prompts to reduce token costs:
SESSION INITIALIZATION RULE:
On every session start:
1. Load ONLY:
- SOUL.md
- USER.md
- IDENTITY.md
- memory/YYYY-MM-DD.md (if exists)
2. DO NOT auto-load:
- MEMORY.md
- Session history
- Prior messages
- Previous tool outputs
3. When user asks about context:
- memory_search()
- memory_get() snippet only
4. Update memory/YYYY-MM-DD.md with:
- work done
- decisions
- leads
- blockers
- next steps
Heartbeat to Ollama
We can use local model to reduce token costs.
Install the model:
ollama pull qwen3:4b
Configure the heartbeat:
"heartbeat": {
"every": "1h",
"model": "ollama/qwen3:4b",
"session": "main"
}
Rate limiting
For rate limiting, we can use the following rules:
RATE LIMITS:
- 5s between API calls
- 10s between searches
- max 5 searches per batch
- batch similar work
- DAILY BUDGET: $1
- MONTHLY BUDGET: $30
Andrew Dorokhov