OpenClaw: reducing token costs | Dorokhov.codes

OpenClaw keeps context across sessions, tools, and workspace files, so token usage can grow faster than in a one-off chat. Below are practical ways to keep costs predictable without giving up what you actually need from the agent.

We can use the following prompts to reduce token costs:

SESSION INITIALIZATION RULE:

On every session start:

1. Load ONLY:
   - SOUL.md
   - USER.md
   - IDENTITY.md
   - memory/YYYY-MM-DD.md (if exists)

2. DO NOT auto-load:
   - MEMORY.md
   - Session history
   - Prior messages
   - Previous tool outputs

3. When user asks about context:
   - memory_search()
   - memory_get() snippet only

4. Update memory/YYYY-MM-DD.md with:
   - work done
   - decisions
   - leads
   - blockers
   - next steps

Heartbeat to Ollama

We can use local model to reduce token costs.

Install the model:

ollama pull qwen3:4b

Configure the heartbeat:

"heartbeat": {
  "every": "1h",
  "model": "ollama/qwen3:4b",
  "session": "main"
}

Rate limiting

For rate limiting, we can use the following rules:

RATE LIMITS:

- 5s between API calls
- 10s between searches
- max 5 searches per batch
- batch similar work
- DAILY BUDGET: $1
- MONTHLY BUDGET: $30

Heartbeat to Ollama

Rate limiting

Need Help with Development?