Claude Opus 4.8 Pricing (2026): What It Really Costs to Run

// in this guide

The list price (verified Jun 2026)
Why caching is the whole game
A worked 200k-token codebase workflow
Cold vs. cache-smart daily cost
Where the money actually leaks
The takeaway

"How much does Claude Opus 4.8 cost?" has two answers. The list price is fixed and public. Your bill is not — it can swing 5–10× on the same workload depending on how you structure the prompt. This guide covers both, with numbers you can reproduce.

Opus 4.8 is Anthropic's current flagship as of June 2026. It's the model the whole ClaudeFarm farm is built on, and it's the one our cost tooling is tuned for. Everything below is the API pricing we pay and verify, not a marketing table.

The list price (verified Jun 2026)

Three numbers, all per million tokens:

Token type	Price / 1M tokens	What it is
Input (uncached)	$5.00	Every token you send that isn't a cache hit
Cached input (cache-read)	$0.50	Tokens served from a warm prompt cache — ~10% of input
Output	$25.00	Every token the model generates

Two facts do most of the work here. First, output is 5× the price of input — verbose responses are where naïve usage bleeds money. Second, cached input is one-tenth the price of fresh input. That 10× gap on the input side is the single biggest lever you have, and most teams leave it on the table.

Prices change. These are verified for June 2026. Anthropic adjusts pricing across model generations, so before you build a budget on these numbers, compute your own cost free in the Hive Terminal — it runs the real budgeter against the rates above and shows you the live math.

Why caching is the whole game

Anthropic's prompt cache lets you mark a stable prefix — system prompt, codebase, reference docs, anything that doesn't change turn to turn — and reuse it. The mechanics that matter for cost:

Cached reads cost ~10% of input ($0.50/M vs. $5.00/M).
The warm window is 5 minutes. Each cache hit refreshes the timer. Go quiet for 5 minutes and the cache goes cold; the next call pays full input price to re-warm it.
The variable tail is always full price. Your actual question, the current diff, fresh input — those are never cached, and that's fine. They're small.

The mental model that unlocks the savings: treat the context window as a persistent cache, not a one-shot prompt. Send your big stable prefix once, then fire many small queries against the warm cache inside the 5-minute window. The whole session costs about what a couple of cold queries would. This is the core idea behind the free context-budgeter skill and the paid 1M Context Cookbook — and it's exactly the pattern we cover in How to Actually Use Claude's 1M Context Window.

A worked 200k-token codebase workflow

Here's the canonical case: you load a 200,000-token codebase as a stable prefix and ask Claude Opus 4.8 questions about it. Each question is a ~1,000-token tail (your prompt), and each answer is a ~1,500-token output.

The first call is cold. Nothing is cached yet, so the whole 200k prefix is billed at the input rate:

Component	Tokens	Rate	Cost
Prefix (codebase) — uncached	200,000	$5.00/M	$1.0000
Tail (your question)	1,000	$5.00/M	$0.0050
Output (the answer)	1,500	$25.00/M	$0.0375
First call total			$1.0425

Every follow-up within 5 minutes is warm. The 200k prefix is now a cache hit at $0.50/M — one-tenth the price:

Component	Tokens	Rate	Cost
Prefix (codebase) — cached	200,000	$0.50/M	$0.1000
Tail (your question)	1,000	$5.00/M	$0.0050
Output (the answer)	1,500	$25.00/M	$0.0375
Warm call total			$0.1425

So a 10-question session against that codebase, all inside warm windows, costs $1.0425 + (9 × $0.1425) = $2.32. Run the same 10 questions cold — re-sending the 200k prefix at full price every time — and you'd pay 10 × $1.0425 = $10.43. Same work, same answers, 4.5× the bill for ignoring the cache.

The lever in one line: the cached prefix dropped from $1.00 to $0.10 per call. On a big stable prefix, caching turns the dominant cost line into a rounding error.

Cold vs. cache-smart daily cost

Scale that to a working day. Say a developer or an agent fires 200 queries against the same 200k-token codebase, and the work is bursty enough that cache stays warm for most of them (a realistic 1 cold re-warm per 20 queries):

Strategy	Cold calls	Warm calls	Daily cost
Naïve (no caching)	200	0	$208.50
Cache-smart (~10 re-warms)	10	190	$37.50

Same 200 queries. The cache-smart setup costs ~$37.50/day vs. ~$208.50/day — an 82% cut, purely from structuring the prompt so the codebase is a cached prefix instead of fresh input on every call. Over a 20-workday month that's the difference between roughly $750 and $4,170.

// stop guessing

The 1M Context Cookbook

25 long-context recipes for Opus 4.8 — each one engineered around cache economics so your prefix stays warm and your bill stays low. The exact patterns behind the numbers above, drop-in ready. One payment, lifetime updates.

$9 one-time

Get the Cookbook →

Where the money actually leaks

Four mistakes account for most surprise bills on Opus 4.8:

Letting the cache go cold. The 5-minute window resets on every hit. A workflow that fires once every 6 minutes pays full input price every single time — you get zero cache benefit. Either batch your queries into bursts or accept that you're running cold.
Reordering the prefix. Caching is prefix-based. If you shuffle the order of your system prompt or codebase between calls, you bust the cache and re-pay full input. Keep the stable part byte-stable.
Over-generating output. Output is $25/M — 5× input. A skill that returns three paragraphs where one sentence would do is quietly the most expensive thing in your stack. Constrain response length.
Stuffing context you'll never query. A 1M-token prefix you only ask one question against is mostly wasted cold spend. If you're doing exact lookups, a SQL or vector search is cheaper than long context — that's a retrieval job, not a context job.

If you're architecting an agent or a multi-turn workflow and want to know the bill before you ship it, the free context-budgeter skill calculates cache hit rates and per-query cost from a plain-English description of your workflow. It's a complete, working ClaudeFarm crop given away so you can judge the quality of the paid ones.

The takeaway

Claude Opus 4.8 lists at $5/M input, $0.50/M cached input, $25/M output. But the list price isn't your bill. Structure your prompt so the big stable part is a cached prefix, keep your bursts inside the 5-minute warm window, and constrain output — and the same workload costs a fraction of the naïve number. We just watched a 200-query day drop from $208 to $37 on exactly that move.

Want the real math for your workflow, not our example? Compute your own cost free in the Hive Terminal — it runs the live budgeter against current Opus 4.8 rates. And if you want the patterns baked in, the 1M Context Cookbook ships 25 cache-smart recipes for $9, one payment.

New to skills? Start with our companion guide: How to Install Claude Code Skills (2026 Step-by-Step).

ClaudeFarm Team

Field reports from the team shipping 36+ AI apps on Claude. AgentHive Inc. · Palm Coast, FL.

Claude Opus 4.8 Pricing (2026): What It Really Costs to Run

The list price (verified Jun 2026)

Why caching is the whole game

A worked 200k-token codebase workflow

Cold vs. cache-smart daily cost

Where the money actually leaks

The takeaway

Related field guides