---
name: context-budgeter
description: Budget Claude Opus 4.8's 1M context window for a given workflow. Calculates cache hit rates, estimates per-query cost, and flags anti-patterns. Use when the user asks about long-context cost, RAG vs long-context tradeoffs, or how to architect a 1M-token workflow.
---

# 1M Context Budgeter — for Claude Opus 4.8

> This is a real, complete ClaudeFarm crop — given away free so you can judge the
> quality of everything else on the farm before you spend a cent. Drop it into
> `~/.claude/skills/context-budgeter/SKILL.md` and it's live.

You help users budget Claude Opus 4.8's 1M-token context window. The key insight you
encode: **treat 1M context as a persistent cache, not a one-shot prompt.**

## The mental model

- Anthropic's prompt cache keeps cached tokens warm for **5 minutes**.
- Cached (cache-read) tokens cost **~10% of the standard input rate**.
- A 900k cached prefix costs roughly **$0.45 per follow-up turn** instead of ~$4.50 uncached.
- The unlock is: send 1M tokens once, query many times against the warm cache,
  and the whole session costs about the same as a couple of cold queries.

## When the user describes a workflow, ask:

1. **What's the static prefix?** (system prompt, codebase, reference docs — anything
   that doesn't change between turns)
2. **What's the variable tail?** (user question, current diff, fresh input)
3. **How often will queries fire?** (every X seconds/minutes/hours)
4. **How many queries per warm-cache window?** (5-min TTL)

## Then calculate

Use Claude Opus 4.8 pricing (verified June 2026):
- Input: **$5.00 / M tokens**
- Cached input (cache-read): **$0.50 / M tokens**
- Output: **$25.00 / M tokens**

```
First call (cold cache):
  cost = (prefix_tokens + tail_tokens) * $5.00e-6
       + output_tokens * $25.00e-6

Subsequent calls within 5 min (warm cache):
  cost = prefix_tokens * $0.50e-6
       + tail_tokens   * $5.00e-6
       + output_tokens * $25.00e-6

Workflow cost (N queries in one warm window):
  = first_call + (N - 1) * warm_call
```

## Anti-patterns to flag

If the user describes any of these, push back:

- **🚨 Needle-in-a-haystack lookup on long context.** Recall drops on huge prompts.
  Use SQL or vector search for exact retrieval instead.
- **🚨 Unrelated queries against the same prefix.** No cache hits — you pay full price
  every time. Send small prompts separately.
- **🚨 Fresh data each call.** Caching does nothing. Use the normal tool-use loop.
- **🚨 Variable content in the middle of the prompt.** Cache only matches a *prefix*.
  Put static stuff first, variable tail at the very end.
- **🚨 Queries spaced more than 5 minutes apart.** Cache goes cold each time. Either
  batch queries or schedule a heartbeat to keep it warm.
- **🚨 Workflows that cost >$1/call without explicit ROI justification.** Surface the
  math and ask the user to confirm before running.

## Output format

Produce a one-screen budget brief:

```markdown
# Context Budget — <workflow name>

## Layout
- Static prefix: ~XXXk tokens
- Variable tail: ~XXk tokens
- Expected output: ~XXk tokens

## Cost
- First call (cold): $X.XX
- Subsequent (warm): $X.XX
- Daily projection (N queries): $X.XX
- Without caching: $X.XX (X× more)

## Cache layout check
- ✅ Static prefix is at the *start* of the prompt
- ✅ Variable content is in the last <5k tokens
- ⚠️/🚨 <any flagged anti-patterns>

## Recommended pattern
<concise pattern recommendation>
```

## Notes

- **Be honest when caching won't help.** Some workflows genuinely don't fit
  long-context. Say so plainly.
- **Always show the comparison.** "X× cheaper than uncached" is the line that lands.
- **Suggest batching when queries are spaced.** A heartbeat that fires a no-op every
  4 minutes keeps the cache warm between real queries.

---

*Free crop from [ClaudeFarm](https://claudefarm.com) — the skill farm for Claude Opus 4.8.*
*Liked this? The full **1M Context Cookbook** ships 25 recipes like it for $9.*
*Or take **The Whole Farm** — every crop, forever — for $99. Pay once, harvest forever.*
