Claude Code Unpacked : A visual guide
At 2:14 a.m. last month, I watched a coding agent burn through 187,000 input tokens to make a six-line change.
The bug was simple: a FastAPI route accepted user_id as a string, while the downstream billing client expected an integer. The agent eventually fixed it. But before it got there, it read the whole repository index, opened unrelated React components, summarized three migrations, and re-ran a test suite that could not possibly exercise the route.
That is the real value of “Claude Code unpacked” as a mental model: not “how magical is the agent?”, but “what is actually happening between prompt, context, tools, files, and model calls?”
Once you see the moving parts visually, you stop treating Claude Code, GPT coding agents, Gemini CLI workflows, or your own custom agent as a black box. You start designing better prompts, safer tool permissions, cheaper context flows, and more reliable API-backed developer systems.
The Short Version: Claude Code Is an Agent Loop
A coding agent is not just a chat window with repo access. In practice, it behaves like a loop:
User goal
↓
Plan / reasoning
↓
Context selection
↓
Tool call: read/search/edit/test
↓
Model observes result
↓
Refine plan
↓
More tools or final answer
The important part is that the model does not “know” your codebase by default. It has to inspect it. Every file read, search result, terminal output, and error message becomes context. That context costs tokens, affects quality, and can distract the model if it is poorly scoped.
A visual way to think about it:
┌────────────┐
│ Your task │
└─────┬──────┘
↓
┌────────────┐ ┌──────────────┐
│ Agent mind │ ←── │ Tool results │
└─────┬──────┘ └──────┬───────┘
↓ ↑
┌────────────┐ ┌──────┴───────┐
│ Model API │ │ Repo / shell │
└────────────┘ └──────────────┘
Claude Code is one implementation of this pattern. The same architecture shows up when building agents with Claude Opus 4.8, Sonnet 4.6, Haiku 4.5, GPT-5.5, Gemini 3, or long-context models like Fable 5 with a 1M-token window.
The practical engineering question is not “which model is smartest?” It is:
How do I feed the right context, at the right time, with the right tool constraints, for the lowest acceptable cost?
The Core Components
1. The Model
The model is responsible for interpreting the task, deciding what to inspect, writing code, and explaining changes. Stronger models usually perform better on ambiguous refactors, architectural reasoning, and messy debugging.
But model choice is not binary. I often split coding-agent work into tiers:
| Workload | Good Fit | Why |
|---|---|---|
| Repository search and summarization | Haiku 4.5, smaller GPT/Gemini models | Cheap, fast, low-risk |
| Focused bug fixes | Sonnet 4.6, Gemini 3 | Strong enough for code reasoning |
| Complex migrations | Claude Opus 4.8, GPT-5.5 | Better long-horizon planning |
| Huge monorepo analysis | Fable 5 1M context, long-context Gemini/Claude variants | Fewer retrieval misses, higher context cost |
| Production code review | Strongest available model plus tests | Accuracy matters more than speed |
A common gotcha: using the largest model for every step feels safe, but it can be wasteful. The expensive part is often not generating the final patch. It is repeatedly stuffing large chunks of repo context into the model.
2. The Context Window
The context window is the model’s working memory for a single request or agent step. It may contain:
- Your prompt
- System instructions
- Repo instructions such as
AGENTS.md - File contents
- Search results
- Test output
- Prior conversation
- The agent’s plan
- Tool call metadata
The model can only reason over what is present. If the relevant file is missing, it may guess. If too many irrelevant files are present, it may anchor on noise.
Here is a simple token budget example:
User task: 120 tokens
System / agent rules: 2,500 tokens
Repo instructions: 800 tokens
Three source files: 9,000 tokens
Test output: 4,500 tokens
Prior conversation: 6,000 tokens
Patch draft: 1,200 tokens
-------------------------------------
Total input: 24,120 tokens
Now imagine an agent does eight iterations like that. You are not paying for “one coding task.” You are paying for repeated context construction.
If a model charges per million input and output tokens, the rough math is:
input_tokens = 24_120 * 8 = 192_960
output_tokens = 2_000 * 8 = 16_000
At API scale, that difference matters. This is where a multi-model access layer such as AI Prime Tech can fit naturally: route cheap inspection steps to lower-cost Claude/GPT/Gemini-compatible models, then reserve premium calls for the final reasoning and patch generation.
3. Tools
Claude Code-style agents become useful because they can use tools:
{
"tool": "read_file",
"input": {
"path": "app/billing/routes.py"
}
}
{
"tool": "run_shell",
"input": {
"command": "pytest tests/billing/test_routes.py -q"
}
}
{
"tool": "edit_file",
"input": {
"path": "app/billing/routes.py",
"patch": "..."
}
}
Tool access is power. It is also risk. In practice, I separate tools into four permission levels:
| Tool Type | Examples | Risk | Recommended Default |
|---|---|---|---|
| Read-only | rg, cat, ls, file search | Low | Allow |
| Local validation | pytest, npm test, ruff | Medium | Allow with limits |
| File modification | patches, formatters | Medium | Allow after plan |
| External side effects | deploys, DB writes, network calls | High | Require approval |
The dangerous command is not always obvious. This is safe:
pytest tests/billing/test_routes.py -q
This is not something I want an agent running casually:
python scripts/backfill_invoices.py --env prod
A good agent workflow makes side effects explicit.
A Visual Walkthrough: Fixing a Real API Bug
Suppose we have this route:
from fastapi import APIRouter
from app.billing.client import BillingClient
router = APIRouter()
client = BillingClient()
@router.post("/users/{user_id}/charge")
def charge_user(user_id: str, amount_cents: int):
result = client.charge(user_id=user_id, amount_cents=amount_cents)
return {"status": result.status}
And this client:
class BillingClient:
def charge(self, user_id: int, amount_cents: int):
if user_id <= 0:
raise ValueError("invalid user_id")
...
The runtime failure is predictable:
TypeError: '<=' not supported between instances of 'str' and 'int'
A well-behaved coding agent should not read the entire repo. The ideal loop is:
1. Search for route
2. Read route file
3. Read billing client signature
4. Inspect tests for this route
5. Patch type annotation / validation
6. Run targeted test
7. Summarize
That looks like:
rg "charge_user|/charge|BillingClient" app tests
Then:
@router.post("/users/{user_id}/charge")
def charge_user(user_id: int, amount_cents: int):
result = client.charge(user_id=user_id, amount_cents=amount_cents)
return {"status": result.status}
With a focused test:
def test_charge_user_uses_integer_user_id(client, monkeypatch):
seen = {}
class FakeBillingClient:
def charge(self, user_id: int, amount_cents: int):
seen["user_id"] = user_id
return type("Result", (), {"status": "ok"})()
monkeypatch.setattr("app.billing.routes.client", FakeBillingClient())
response = client.post("/users/123/charge", params={"amount_cents": 500})
assert response.status_code == 200
assert seen["user_id"] == 123
What actually happens when an agent goes wrong?
Task: "Fix billing route bug"
↓
Agent searches "billing"
↓
Gets 300 results
↓
Reads models, migrations, frontend billing page
↓
Finds route late
↓
Makes correct patch
↓
Runs broad test suite
↓
Times out
↓
Reports uncertainty
The fix is not just “use a better model.” The fix is better workflow design:
- Start with narrow search terms from the error.
- Prefer targeted tests before broad tests.
- Keep tool output small.
- Ask the model to explain why each file is relevant.
- Stop reading once the dependency path is clear.
Building Your Own Claude-Code-Like Agent with APIs
You can implement the core pattern with any modern model API. The minimal architecture has:
- A planner prompt
- A tool registry
- A loop controller
- A context manager
- A patch application layer
- A validation step
Here is simplified Python pseudocode:
tools = {
"search": search_repo,
"read_file": read_file,
"run_tests": run_tests,
"apply_patch": apply_patch,
}
messages = [
{"role": "system", "content": "You are a careful coding agent. Use tools before editing."},
{"role": "user", "content": "Fix the billing route user_id type bug."},
]
for step in range(12):
response = model.generate(
model="sonnet-4.6",
messages=messages,
tools=tool_schemas,
)
if response.final:
print(response.content)
break
tool_name = response.tool_call.name
tool_input = response.tool_call.input
if tool_name == "run_tests" and "prod" in str(tool_input):
raise RuntimeError("Blocked risky command")
result = tools[tool_name](**tool_input)
messages.append(response.as_message())
messages.append({
"role": "tool",
"name": tool_name,
"content": truncate(result, max_tokens=3000),
})
That truncate call is not a footnote. It is one of the most important production details. Tool output can destroy your token budget.
For example, this is bad:
pytest
If the suite fails with 40 stack traces, the model receives a wall of noise.
This is better:
pytest tests/billing/test_routes.py -q --tb=short
And this is often best inside an agent loop:
pytest tests/billing/test_routes.py::test_charge_user_uses_integer_user_id -q --tb=short
The model does not need every failure. It needs the next actionable failure.
Context Engineering Beats Prompt Engineering
Prompt engineering still matters, but context engineering matters more for coding agents.
A vague prompt with perfect context often works:
Fix this failing test.
A brilliant prompt with missing context usually fails:
Carefully analyze the architecture, reason step by step, and implement the minimal correct fix...
If the model never sees the client signature, it may still patch the wrong layer.
In my own agent flows, I treat context as a staged pipeline:
Stage 1: Locate
- Search results only
- No full files unless necessary
Stage 2: Understand
- Read smallest relevant files
- Include interfaces, tests, config
Stage 3: Edit
- Include target file and nearby tests
- Exclude unrelated search output
Stage 4: Verify
- Include concise test result
- Include only new failures
This works across Claude, GPT, and Gemini models because it respects the same underlying constraint: attention is finite, even when the context window is large.
Long-context models change the trade-off, not the principle. Fable 5’s 1M context is useful when the relevant dependency chain is genuinely huge, but dumping a monorepo into context can still make the model slower and more distractible. Bigger windows reduce retrieval failures; they do not eliminate the need to curate evidence.
Best Practices for API Builders
Make Tool Calls Observable
Log every agent action as structured data:
{
"request_id": "req_8421",
"step": 4,
"model": "claude-sonnet-4.6",
"tool": "read_file",
"path": "app/billing/routes.py",
"input_tokens": 18320,
"output_tokens": 740
}
You want to know:
- Which files were read
- Which commands ran
- How many tokens each step used
- Whether the model edited before understanding
- Which validation command succeeded or failed
This is how you debug cost and quality.
Use Model Routing
Do not force one model to do everything. A practical routing strategy:
Repo search summary → Haiku 4.5
Patch planning → Sonnet 4.6 or Gemini 3
Complex architecture → Opus 4.8 or GPT-5.5
Long-context sweep → Fable 5 or long-context Gemini
Final review → Strongest available model
If you are building a tool for a team, this is also where AI Prime Tech can be useful: one integration point for cheaper Claude, GPT, and Gemini API access, while keeping your routing logic model-agnostic.
Put Hard Limits on the Loop
Agents need budgets:
{
"max_steps": 12,
"max_input_tokens_per_step": 50000,
"max_total_cost_usd": 1.25,
"allowed_commands": ["rg", "pytest", "ruff", "mypy"],
"blocked_patterns": ["prod", "deploy", "rm -rf", "DROP TABLE"]
}
Without limits, a stuck agent can keep searching, summarizing, and retrying. The failure mode is not dramatic. It is quietly expensive.
Prefer Patches Over Rewrites
For code editing, patches are safer than full-file rewrites. Full rewrites tend to:
- Drop comments or formatting
- Reorder imports unnecessarily
- Change unrelated code
- Create larger diffs that are harder to review
A good coding agent should produce the smallest meaningful diff. That is not just aesthetic; it reduces merge conflicts and review time.
Trade-Offs and Limitations
Claude Code-style workflows are powerful, but they are not magic.
The main trade-offs:
| Benefit | Cost |
|---|---|
| Can inspect and modify real repos | Needs careful tool permissions |
| Handles multi-step debugging | Can drift without budgets |
| Reduces manual boilerplate | May over-read context |
| Works across many model APIs | Model-specific tool syntax varies |
| Improves with tests | Weak tests produce false confidence |
A common gotcha: agents often optimize for “make the test pass” rather than “preserve the system contract.” If the only test asserts 200 OK, the agent may satisfy that while breaking validation semantics. Strong test design still matters.
Another gotcha: model confidence is not verification. A final message saying “all tests pass” is only meaningful if you have the command output. In production agent systems, I store the exact validation command and output with the run record.
Practical Takeaways
- Treat Claude Code as an agent loop: model, context, tools, observations, and edits.
- Keep context narrow first; expand only when the dependency path demands it.
- Route simple search and summarization to cheaper models, then reserve premium models for hard reasoning.
- Use targeted validation commands before broad test suites.
- Log tool calls, token usage, file reads, patches, and test results.
- Put explicit limits on steps, cost, commands, and side effects.
- Prefer small patches over full-file rewrites.
- Remember that long context helps, but curated context still wins.
The visual model is the unlock: once you can see the loop, you can improve it. Claude Code is not just a product category; it is a pattern for building safer, cheaper, more capable developer agents on top of Claude, GPT, Gemini, and whatever strong models come next.
One API key for Claude Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5, plus GPT & Gemini — up to 80% off official pricing, pay-as-you-go.
Get Your API Key →