Jun 13, 2026 · 10 min · Dev Guides

Practical Guide to Claude Extended Thinking & Reasoning Effort (2026)

DO By Daniel Okafor · Developer Advocate

Claude’s extended thinking lets the model reason before it answers, and in 2026 the controls changed: the old fixed budget_tokens approach gave way to an adaptive effort parameter. This guide covers how to use it well, how interleaved thinking behaves inside tool loops, and the caching traps that quietly inflate bills.

From budget_tokens to effort

Earlier Claude versions asked you to set a hard thinking budget in tokens. Current flagship models — Opus 4.8 and Sonnet 4.6 — instead accept an effort signal and adapt how much they think based on the difficulty of the task. The practical effect:

You stop guessing an exact token budget.
The model spends more reasoning on genuinely hard inputs and less on easy ones.
Costs track difficulty rather than a flat ceiling you picked in advance.

When to use high vs low effort

Effort is a dial, not a default. Reach for higher effort when:

The task is genuinely hard: multi-step math, tricky debugging, planning, or analysis with many constraints.
A wrong answer is expensive and you’d happily pay more tokens for reliability.

Prefer lower effort when:

The task is mechanical: formatting, extraction, classification, short rewrites.
Latency matters and the answer is unlikely to benefit from long deliberation.

A common mistake is leaving everything on maximum effort “to be safe.” That burns output tokens on tasks that never needed them. Match effort to the job.

Interleaved thinking in tool loops

In agentic workflows, Claude can think between tool calls — reason, call a tool, see the result, reason again. This interleaved thinking is what makes agents feel deliberate instead of reflexive. A few rules that keep it working:

Preserve the thinking blocks and their signatures across turns when the API expects them; stripping them can break the chain.
Let the model think after a tool result before deciding the next action — that’s where it catches mistakes.
Keep tool results tidy; noisy or oversized results push useful reasoning out of the window.

The caching pitfall

Prompt caching and extended thinking interact in a way that surprises people. Caching works on a stable prefix — if the cached portion of your prompt changes between requests, you lose the hit and pay full input price again. Thinking-heavy agents accumulate context fast, so:

Keep the cacheable prefix (system prompt, tool defs, stable instructions) genuinely stable.
Append volatile content (latest tool result, new user turn) after the cached prefix, not woven into it.
Watch your cache-hit rate; a sudden drop usually means something mutable crept into the prefix.

Done right, caching plus adaptive effort is the combination that makes long, thinking-heavy agents affordable.

A simple effort-selection table

Task type	Suggested effort
Extraction / classification / formatting	Low
Everyday code generation	Low–medium
Refactoring with constraints	Medium
Hard debugging / planning / analysis	High
Agentic multi-step with tools	Medium, interleaved

Cost note

Thinking spends output tokens, which are the expensive half of the bill. That makes effort discipline a real cost lever — and it stacks with the rate you pay. Running Claude through a pay-as-you-go gateway like AI Prime Tech (up to 80% off official rates, one key for Opus 4.8, Sonnet 4.6, Haiku 4.5, plus GPT and Gemini) means even high-effort runs cost a fraction of list price, so you can afford to think hard exactly where it matters.

Takeaway

Stop hard-coding thinking budgets. Use adaptive effort, match the dial to task difficulty, preserve thinking blocks inside tool loops, and protect your cache prefix. That’s the 2026 recipe for getting Claude’s reasoning quality without paying for deliberation you didn’t need.

Daniel Okafor · Developer Advocate

Daniel is a developer advocate and long-time Claude Code / Cursor user. He covers AI coding workflows, new model launches, tooling, and hands-on guides for developers shipping with the Claude API.

Get cheaper Claude API access

One API key for Claude Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5, plus GPT & Gemini — up to 80% off official pricing, pay-as-you-go.

Get Your API Key →

AI Prime Tech is an independent third-party API gateway. Claude™ and Anthropic® are trademarks of Anthropic, PBC. No affiliation or endorsement is implied.