Jun 14, 2026 · 7 min · Dev Guides

Claude Code Unpacked : A visual guide

Claude Code Unpacked : A visual guide

At 2:14 a.m. last month, I watched a coding agent burn through 187,000 input tokens to make a six-line change.

The bug was simple: a FastAPI route accepted user_id as a string, while the downstream billing client expected an integer. The agent eventually fixed it. But before it got there, it read the whole repository index, opened unrelated React components, summarized three migrations, and re-ran a test suite that could not possibly exercise the route.

That is the real value of “Claude Code unpacked” as a mental model: not “how magical is the agent?”, but “what is actually happening between prompt, context, tools, files, and model calls?”

Once you see the moving parts visually, you stop treating Claude Code, GPT coding agents, Gemini CLI workflows, or your own custom agent as a black box. You start designing better prompts, safer tool permissions, cheaper context flows, and more reliable API-backed developer systems.

The Short Version: Claude Code Is an Agent Loop

A coding agent is not just a chat window with repo access. In practice, it behaves like a loop:

User goal

Plan / reasoning

Context selection

Tool call: read/search/edit/test

Model observes result

Refine plan

More tools or final answer

The important part is that the model does not “know” your codebase by default. It has to inspect it. Every file read, search result, terminal output, and error message becomes context. That context costs tokens, affects quality, and can distract the model if it is poorly scoped.

A visual way to think about it:

┌────────────┐
│ Your task  │
└─────┬──────┘

┌────────────┐     ┌──────────────┐
│ Agent mind │ ←── │ Tool results │
└─────┬──────┘     └──────┬───────┘
      ↓                   ↑
┌────────────┐     ┌──────┴───────┐
│ Model API  │     │ Repo / shell │
└────────────┘     └──────────────┘

Claude Code is one implementation of this pattern. The same architecture shows up when building agents with Claude Opus 4.8, Sonnet 4.6, Haiku 4.5, GPT-5.5, Gemini 3, or long-context models like Fable 5 with a 1M-token window.

The practical engineering question is not “which model is smartest?” It is:

How do I feed the right context, at the right time, with the right tool constraints, for the lowest acceptable cost?

The Core Components

1. The Model

The model is responsible for interpreting the task, deciding what to inspect, writing code, and explaining changes. Stronger models usually perform better on ambiguous refactors, architectural reasoning, and messy debugging.

But model choice is not binary. I often split coding-agent work into tiers:

WorkloadGood FitWhy
Repository search and summarizationHaiku 4.5, smaller GPT/Gemini modelsCheap, fast, low-risk
Focused bug fixesSonnet 4.6, Gemini 3Strong enough for code reasoning
Complex migrationsClaude Opus 4.8, GPT-5.5Better long-horizon planning
Huge monorepo analysisFable 5 1M context, long-context Gemini/Claude variantsFewer retrieval misses, higher context cost
Production code reviewStrongest available model plus testsAccuracy matters more than speed

A common gotcha: using the largest model for every step feels safe, but it can be wasteful. The expensive part is often not generating the final patch. It is repeatedly stuffing large chunks of repo context into the model.

2. The Context Window

The context window is the model’s working memory for a single request or agent step. It may contain:

The model can only reason over what is present. If the relevant file is missing, it may guess. If too many irrelevant files are present, it may anchor on noise.

Here is a simple token budget example:

User task:                  120 tokens
System / agent rules:     2,500 tokens
Repo instructions:          800 tokens
Three source files:       9,000 tokens
Test output:              4,500 tokens
Prior conversation:       6,000 tokens
Patch draft:              1,200 tokens
-------------------------------------
Total input:             24,120 tokens

Now imagine an agent does eight iterations like that. You are not paying for “one coding task.” You are paying for repeated context construction.

If a model charges per million input and output tokens, the rough math is:

input_tokens = 24_120 * 8 = 192_960
output_tokens = 2_000 * 8 = 16_000

At API scale, that difference matters. This is where a multi-model access layer such as AI Prime Tech can fit naturally: route cheap inspection steps to lower-cost Claude/GPT/Gemini-compatible models, then reserve premium calls for the final reasoning and patch generation.

3. Tools

Claude Code-style agents become useful because they can use tools:

{
  "tool": "read_file",
  "input": {
    "path": "app/billing/routes.py"
  }
}
{
  "tool": "run_shell",
  "input": {
    "command": "pytest tests/billing/test_routes.py -q"
  }
}
{
  "tool": "edit_file",
  "input": {
    "path": "app/billing/routes.py",
    "patch": "..."
  }
}

Tool access is power. It is also risk. In practice, I separate tools into four permission levels:

Tool TypeExamplesRiskRecommended Default
Read-onlyrg, cat, ls, file searchLowAllow
Local validationpytest, npm test, ruffMediumAllow with limits
File modificationpatches, formattersMediumAllow after plan
External side effectsdeploys, DB writes, network callsHighRequire approval

The dangerous command is not always obvious. This is safe:

pytest tests/billing/test_routes.py -q

This is not something I want an agent running casually:

python scripts/backfill_invoices.py --env prod

A good agent workflow makes side effects explicit.

A Visual Walkthrough: Fixing a Real API Bug

Suppose we have this route:

from fastapi import APIRouter
from app.billing.client import BillingClient

router = APIRouter()
client = BillingClient()

@router.post("/users/{user_id}/charge")
def charge_user(user_id: str, amount_cents: int):
    result = client.charge(user_id=user_id, amount_cents=amount_cents)
    return {"status": result.status}

And this client:

class BillingClient:
    def charge(self, user_id: int, amount_cents: int):
        if user_id <= 0:
            raise ValueError("invalid user_id")
        ...

The runtime failure is predictable:

TypeError: '<=' not supported between instances of 'str' and 'int'

A well-behaved coding agent should not read the entire repo. The ideal loop is:

1. Search for route
2. Read route file
3. Read billing client signature
4. Inspect tests for this route
5. Patch type annotation / validation
6. Run targeted test
7. Summarize

That looks like:

rg "charge_user|/charge|BillingClient" app tests

Then:

@router.post("/users/{user_id}/charge")
def charge_user(user_id: int, amount_cents: int):
    result = client.charge(user_id=user_id, amount_cents=amount_cents)
    return {"status": result.status}

With a focused test:

def test_charge_user_uses_integer_user_id(client, monkeypatch):
    seen = {}

    class FakeBillingClient:
        def charge(self, user_id: int, amount_cents: int):
            seen["user_id"] = user_id
            return type("Result", (), {"status": "ok"})()

    monkeypatch.setattr("app.billing.routes.client", FakeBillingClient())

    response = client.post("/users/123/charge", params={"amount_cents": 500})

    assert response.status_code == 200
    assert seen["user_id"] == 123

What actually happens when an agent goes wrong?

Task: "Fix billing route bug"

Agent searches "billing"

Gets 300 results

Reads models, migrations, frontend billing page

Finds route late

Makes correct patch

Runs broad test suite

Times out

Reports uncertainty

The fix is not just “use a better model.” The fix is better workflow design:

Building Your Own Claude-Code-Like Agent with APIs

You can implement the core pattern with any modern model API. The minimal architecture has:

Here is simplified Python pseudocode:

tools = {
    "search": search_repo,
    "read_file": read_file,
    "run_tests": run_tests,
    "apply_patch": apply_patch,
}

messages = [
    {"role": "system", "content": "You are a careful coding agent. Use tools before editing."},
    {"role": "user", "content": "Fix the billing route user_id type bug."},
]

for step in range(12):
    response = model.generate(
        model="sonnet-4.6",
        messages=messages,
        tools=tool_schemas,
    )

    if response.final:
        print(response.content)
        break

    tool_name = response.tool_call.name
    tool_input = response.tool_call.input

    if tool_name == "run_tests" and "prod" in str(tool_input):
        raise RuntimeError("Blocked risky command")

    result = tools[tool_name](**tool_input)

    messages.append(response.as_message())
    messages.append({
        "role": "tool",
        "name": tool_name,
        "content": truncate(result, max_tokens=3000),
    })

That truncate call is not a footnote. It is one of the most important production details. Tool output can destroy your token budget.

For example, this is bad:

pytest

If the suite fails with 40 stack traces, the model receives a wall of noise.

This is better:

pytest tests/billing/test_routes.py -q --tb=short

And this is often best inside an agent loop:

pytest tests/billing/test_routes.py::test_charge_user_uses_integer_user_id -q --tb=short

The model does not need every failure. It needs the next actionable failure.

Context Engineering Beats Prompt Engineering

Prompt engineering still matters, but context engineering matters more for coding agents.

A vague prompt with perfect context often works:

Fix this failing test.

A brilliant prompt with missing context usually fails:

Carefully analyze the architecture, reason step by step, and implement the minimal correct fix...

If the model never sees the client signature, it may still patch the wrong layer.

In my own agent flows, I treat context as a staged pipeline:

Stage 1: Locate
  - Search results only
  - No full files unless necessary

Stage 2: Understand
  - Read smallest relevant files
  - Include interfaces, tests, config

Stage 3: Edit
  - Include target file and nearby tests
  - Exclude unrelated search output

Stage 4: Verify
  - Include concise test result
  - Include only new failures

This works across Claude, GPT, and Gemini models because it respects the same underlying constraint: attention is finite, even when the context window is large.

Long-context models change the trade-off, not the principle. Fable 5’s 1M context is useful when the relevant dependency chain is genuinely huge, but dumping a monorepo into context can still make the model slower and more distractible. Bigger windows reduce retrieval failures; they do not eliminate the need to curate evidence.

Best Practices for API Builders

Make Tool Calls Observable

Log every agent action as structured data:

{
  "request_id": "req_8421",
  "step": 4,
  "model": "claude-sonnet-4.6",
  "tool": "read_file",
  "path": "app/billing/routes.py",
  "input_tokens": 18320,
  "output_tokens": 740
}

You want to know:

This is how you debug cost and quality.

Use Model Routing

Do not force one model to do everything. A practical routing strategy:

Repo search summary     → Haiku 4.5
Patch planning          → Sonnet 4.6 or Gemini 3
Complex architecture    → Opus 4.8 or GPT-5.5
Long-context sweep      → Fable 5 or long-context Gemini
Final review            → Strongest available model

If you are building a tool for a team, this is also where AI Prime Tech can be useful: one integration point for cheaper Claude, GPT, and Gemini API access, while keeping your routing logic model-agnostic.

Put Hard Limits on the Loop

Agents need budgets:

{
  "max_steps": 12,
  "max_input_tokens_per_step": 50000,
  "max_total_cost_usd": 1.25,
  "allowed_commands": ["rg", "pytest", "ruff", "mypy"],
  "blocked_patterns": ["prod", "deploy", "rm -rf", "DROP TABLE"]
}

Without limits, a stuck agent can keep searching, summarizing, and retrying. The failure mode is not dramatic. It is quietly expensive.

Prefer Patches Over Rewrites

For code editing, patches are safer than full-file rewrites. Full rewrites tend to:

A good coding agent should produce the smallest meaningful diff. That is not just aesthetic; it reduces merge conflicts and review time.

Trade-Offs and Limitations

Claude Code-style workflows are powerful, but they are not magic.

The main trade-offs:

BenefitCost
Can inspect and modify real reposNeeds careful tool permissions
Handles multi-step debuggingCan drift without budgets
Reduces manual boilerplateMay over-read context
Works across many model APIsModel-specific tool syntax varies
Improves with testsWeak tests produce false confidence

A common gotcha: agents often optimize for “make the test pass” rather than “preserve the system contract.” If the only test asserts 200 OK, the agent may satisfy that while breaking validation semantics. Strong test design still matters.

Another gotcha: model confidence is not verification. A final message saying “all tests pass” is only meaningful if you have the command output. In production agent systems, I store the exact validation command and output with the run record.

Practical Takeaways

The visual model is the unlock: once you can see the loop, you can improve it. Claude Code is not just a product category; it is a pattern for building safer, cheaper, more capable developer agents on top of Claude, GPT, Gemini, and whatever strong models come next.

MR
Marcus Reed · Senior API Engineer

Marcus has spent 9 years building LLM-backed products and integrating the Claude, GPT and Gemini APIs into production systems. He writes about API cost optimization, agent architecture, and practical model selection.

Get cheaper Claude API access

One API key for Claude Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5, plus GPT & Gemini — up to 80% off official pricing, pay-as-you-go.

Get Your API Key →
AI Prime Tech is an independent third-party API gateway. Claude™ and Anthropic® are trademarks of Anthropic, PBC. No affiliation or endorsement is implied.