Jun 14, 2026 · 7 min · Dev Guides

Claude Code Unpacked : A visual guide

MR By Marcus Reed · Senior API Engineer

At 2:14 a.m. last month, I watched a coding agent burn through 187,000 input tokens to make a six-line change.

The bug was simple: a FastAPI route accepted user_id as a string, while the downstream billing client expected an integer. The agent eventually fixed it. But before it got there, it read the whole repository index, opened unrelated React components, summarized three migrations, and re-ran a test suite that could not possibly exercise the route.

That is the real value of “Claude Code unpacked” as a mental model: not “how magical is the agent?”, but “what is actually happening between prompt, context, tools, files, and model calls?”

Once you see the moving parts visually, you stop treating Claude Code, GPT coding agents, Gemini CLI workflows, or your own custom agent as a black box. You start designing better prompts, safer tool permissions, cheaper context flows, and more reliable API-backed developer systems.

The Short Version: Claude Code Is an Agent Loop

A coding agent is not just a chat window with repo access. In practice, it behaves like a loop:

User goal
  ↓
Plan / reasoning
  ↓
Context selection
  ↓
Tool call: read/search/edit/test
  ↓
Model observes result
  ↓
Refine plan
  ↓
More tools or final answer

The important part is that the model does not “know” your codebase by default. It has to inspect it. Every file read, search result, terminal output, and error message becomes context. That context costs tokens, affects quality, and can distract the model if it is poorly scoped.

A visual way to think about it:

┌────────────┐
│ Your task  │
└─────┬──────┘
      ↓
┌────────────┐     ┌──────────────┐
│ Agent mind │ ←── │ Tool results │
└─────┬──────┘     └──────┬───────┘
      ↓                   ↑
┌────────────┐     ┌──────┴───────┐
│ Model API  │     │ Repo / shell │
└────────────┘     └──────────────┘

Claude Code is one implementation of this pattern. The same architecture shows up when building agents with Claude Opus 4.8, Sonnet 4.6, Haiku 4.5, GPT-5.5, Gemini 3, or long-context models like Fable 5 with a 1M-token window.

The practical engineering question is not “which model is smartest?” It is:

How do I feed the right context, at the right time, with the right tool constraints, for the lowest acceptable cost?

The Core Components

1. The Model

The model is responsible for interpreting the task, deciding what to inspect, writing code, and explaining changes. Stronger models usually perform better on ambiguous refactors, architectural reasoning, and messy debugging.

But model choice is not binary. I often split coding-agent work into tiers:

Workload	Good Fit	Why
Repository search and summarization	Haiku 4.5, smaller GPT/Gemini models	Cheap, fast, low-risk
Focused bug fixes	Sonnet 4.6, Gemini 3	Strong enough for code reasoning
Complex migrations	Claude Opus 4.8, GPT-5.5	Better long-horizon planning
Huge monorepo analysis	Fable 5 1M context, long-context Gemini/Claude variants	Fewer retrieval misses, higher context cost
Production code review	Strongest available model plus tests	Accuracy matters more than speed

A common gotcha: using the largest model for every step feels safe, but it can be wasteful. The expensive part is often not generating the final patch. It is repeatedly stuffing large chunks of repo context into the model.

2. The Context Window

The context window is the model’s working memory for a single request or agent step. It may contain:

Your prompt
System instructions
Repo instructions such as AGENTS.md
File contents
Search results
Test output
Prior conversation
The agent’s plan
Tool call metadata

The model can only reason over what is present. If the relevant file is missing, it may guess. If too many irrelevant files are present, it may anchor on noise.

Here is a simple token budget example:

User task:                  120 tokens
System / agent rules:     2,500 tokens
Repo instructions:          800 tokens
Three source files:       9,000 tokens
Test output:              4,500 tokens
Prior conversation:       6,000 tokens
Patch draft:              1,200 tokens
-------------------------------------
Total input:             24,120 tokens

Now imagine an agent does eight iterations like that. You are not paying for “one coding task.” You are paying for repeated context construction.

If a model charges per million input and output tokens, the rough math is:

input_tokens = 24_120 * 8 = 192_960
output_tokens = 2_000 * 8 = 16_000

At API scale, that difference matters. This is where a multi-model access layer such as AI Prime Tech can fit naturally: route cheap inspection steps to lower-cost Claude/GPT/Gemini-compatible models, then reserve premium calls for the final reasoning and patch generation.

3. Tools

Claude Code-style agents become useful because they can use tools:

{
  "tool": "read_file",
  "input": {
    "path": "app/billing/routes.py"
  }
}

{
  "tool": "run_shell",
  "input": {
    "command": "pytest tests/billing/test_routes.py -q"
  }
}

{
  "tool": "edit_file",
  "input": {
    "path": "app/billing/routes.py",
    "patch": "..."
  }
}

Tool access is power. It is also risk. In practice, I separate tools into four permission levels:

Tool Type	Examples	Risk	Recommended Default
Read-only	`rg`, `cat`, `ls`, file search	Low	Allow
Local validation	`pytest`, `npm test`, `ruff`	Medium	Allow with limits
File modification	patches, formatters	Medium	Allow after plan
External side effects	deploys, DB writes, network calls	High	Require approval

The dangerous command is not always obvious. This is safe:

pytest tests/billing/test_routes.py -q

This is not something I want an agent running casually:

python scripts/backfill_invoices.py --env prod

A good agent workflow makes side effects explicit.

A Visual Walkthrough: Fixing a Real API Bug

Suppose we have this route:

from fastapi import APIRouter
from app.billing.client import BillingClient

router = APIRouter()
client = BillingClient()

@router.post("/users/{user_id}/charge")
def charge_user(user_id: str, amount_cents: int):
    result = client.charge(user_id=user_id, amount_cents=amount_cents)
    return {"status": result.status}

And this client:

class BillingClient:
    def charge(self, user_id: int, amount_cents: int):
        if user_id <= 0:
            raise ValueError("invalid user_id")
        ...

The runtime failure is predictable:

TypeError: '<=' not supported between instances of 'str' and 'int'

A well-behaved coding agent should not read the entire repo. The ideal loop is:

1. Search for route
2. Read route file
3. Read billing client signature
4. Inspect tests for this route
5. Patch type annotation / validation
6. Run targeted test
7. Summarize

That looks like:

rg "charge_user|/charge|BillingClient" app tests

Then:

@router.post("/users/{user_id}/charge")
def charge_user(user_id: int, amount_cents: int):
    result = client.charge(user_id=user_id, amount_cents=amount_cents)
    return {"status": result.status}

With a focused test:

def test_charge_user_uses_integer_user_id(client, monkeypatch):
    seen = {}

    class FakeBillingClient:
        def charge(self, user_id: int, amount_cents: int):
            seen["user_id"] = user_id
            return type("Result", (), {"status": "ok"})()

    monkeypatch.setattr("app.billing.routes.client", FakeBillingClient())

    response = client.post("/users/123/charge", params={"amount_cents": 500})

    assert response.status_code == 200
    assert seen["user_id"] == 123

What actually happens when an agent goes wrong?

Task: "Fix billing route bug"
  ↓
Agent searches "billing"
  ↓
Gets 300 results
  ↓
Reads models, migrations, frontend billing page
  ↓
Finds route late
  ↓
Makes correct patch
  ↓
Runs broad test suite
  ↓
Times out
  ↓
Reports uncertainty

The fix is not just “use a better model.” The fix is better workflow design:

Start with narrow search terms from the error.
Prefer targeted tests before broad tests.
Keep tool output small.
Ask the model to explain why each file is relevant.
Stop reading once the dependency path is clear.

Building Your Own Claude-Code-Like Agent with APIs

You can implement the core pattern with any modern model API. The minimal architecture has:

A planner prompt
A tool registry
A loop controller
A context manager
A patch application layer
A validation step

Here is simplified Python pseudocode:

tools = {
    "search": search_repo,
    "read_file": read_file,
    "run_tests": run_tests,
    "apply_patch": apply_patch,
}

messages = [
    {"role": "system", "content": "You are a careful coding agent. Use tools before editing."},
    {"role": "user", "content": "Fix the billing route user_id type bug."},
]

for step in range(12):
    response = model.generate(
        model="sonnet-4.6",
        messages=messages,
        tools=tool_schemas,
    )

    if response.final:
        print(response.content)
        break

    tool_name = response.tool_call.name
    tool_input = response.tool_call.input

    if tool_name == "run_tests" and "prod" in str(tool_input):
        raise RuntimeError("Blocked risky command")

    result = tools[tool_name](**tool_input)

    messages.append(response.as_message())
    messages.append({
        "role": "tool",
        "name": tool_name,
        "content": truncate(result, max_tokens=3000),
    })

That truncate call is not a footnote. It is one of the most important production details. Tool output can destroy your token budget.

For example, this is bad:

pytest

If the suite fails with 40 stack traces, the model receives a wall of noise.

This is better:

pytest tests/billing/test_routes.py -q --tb=short

And this is often best inside an agent loop:

pytest tests/billing/test_routes.py::test_charge_user_uses_integer_user_id -q --tb=short

The model does not need every failure. It needs the next actionable failure.

Context Engineering Beats Prompt Engineering

Prompt engineering still matters, but context engineering matters more for coding agents.

A vague prompt with perfect context often works:

Fix this failing test.

A brilliant prompt with missing context usually fails:

Carefully analyze the architecture, reason step by step, and implement the minimal correct fix...

If the model never sees the client signature, it may still patch the wrong layer.

In my own agent flows, I treat context as a staged pipeline:

Stage 1: Locate
  - Search results only
  - No full files unless necessary

Stage 2: Understand
  - Read smallest relevant files
  - Include interfaces, tests, config

Stage 3: Edit
  - Include target file and nearby tests
  - Exclude unrelated search output

Stage 4: Verify
  - Include concise test result
  - Include only new failures

This works across Claude, GPT, and Gemini models because it respects the same underlying constraint: attention is finite, even when the context window is large.

Long-context models change the trade-off, not the principle. Fable 5’s 1M context is useful when the relevant dependency chain is genuinely huge, but dumping a monorepo into context can still make the model slower and more distractible. Bigger windows reduce retrieval failures; they do not eliminate the need to curate evidence.

Best Practices for API Builders

Make Tool Calls Observable

Log every agent action as structured data:

{
  "request_id": "req_8421",
  "step": 4,
  "model": "claude-sonnet-4.6",
  "tool": "read_file",
  "path": "app/billing/routes.py",
  "input_tokens": 18320,
  "output_tokens": 740
}

You want to know:

Which files were read
Which commands ran
How many tokens each step used
Whether the model edited before understanding
Which validation command succeeded or failed

This is how you debug cost and quality.

Use Model Routing

Do not force one model to do everything. A practical routing strategy:

Repo search summary     → Haiku 4.5
Patch planning          → Sonnet 4.6 or Gemini 3
Complex architecture    → Opus 4.8 or GPT-5.5
Long-context sweep      → Fable 5 or long-context Gemini
Final review            → Strongest available model

If you are building a tool for a team, this is also where AI Prime Tech can be useful: one integration point for cheaper Claude, GPT, and Gemini API access, while keeping your routing logic model-agnostic.

Put Hard Limits on the Loop

Agents need budgets:

{
  "max_steps": 12,
  "max_input_tokens_per_step": 50000,
  "max_total_cost_usd": 1.25,
  "allowed_commands": ["rg", "pytest", "ruff", "mypy"],
  "blocked_patterns": ["prod", "deploy", "rm -rf", "DROP TABLE"]
}

Without limits, a stuck agent can keep searching, summarizing, and retrying. The failure mode is not dramatic. It is quietly expensive.

Prefer Patches Over Rewrites

For code editing, patches are safer than full-file rewrites. Full rewrites tend to:

Drop comments or formatting
Reorder imports unnecessarily
Change unrelated code
Create larger diffs that are harder to review

A good coding agent should produce the smallest meaningful diff. That is not just aesthetic; it reduces merge conflicts and review time.

Trade-Offs and Limitations

Claude Code-style workflows are powerful, but they are not magic.

The main trade-offs:

Benefit	Cost
Can inspect and modify real repos	Needs careful tool permissions
Handles multi-step debugging	Can drift without budgets
Reduces manual boilerplate	May over-read context
Works across many model APIs	Model-specific tool syntax varies
Improves with tests	Weak tests produce false confidence

A common gotcha: agents often optimize for “make the test pass” rather than “preserve the system contract.” If the only test asserts 200 OK, the agent may satisfy that while breaking validation semantics. Strong test design still matters.

Another gotcha: model confidence is not verification. A final message saying “all tests pass” is only meaningful if you have the command output. In production agent systems, I store the exact validation command and output with the run record.

Practical Takeaways

Treat Claude Code as an agent loop: model, context, tools, observations, and edits.
Keep context narrow first; expand only when the dependency path demands it.
Route simple search and summarization to cheaper models, then reserve premium models for hard reasoning.
Use targeted validation commands before broad test suites.
Log tool calls, token usage, file reads, patches, and test results.
Put explicit limits on steps, cost, commands, and side effects.
Prefer small patches over full-file rewrites.
Remember that long context helps, but curated context still wins.

The visual model is the unlock: once you can see the loop, you can improve it. Claude Code is not just a product category; it is a pattern for building safer, cheaper, more capable developer agents on top of Claude, GPT, Gemini, and whatever strong models come next.

Marcus Reed · Senior API Engineer

Marcus has spent 9 years building LLM-backed products and integrating the Claude, GPT and Gemini APIs into production systems. He writes about API cost optimization, agent architecture, and practical model selection.

Get cheaper Claude API access

One API key for Claude Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5, plus GPT & Gemini — up to 80% off official pricing, pay-as-you-go.

Get Your API Key →

AI Prime Tech is an independent third-party API gateway. Claude™ and Anthropic® are trademarks of Anthropic, PBC. No affiliation or endorsement is implied.