How Claude Code Actually Works: A Deep Technical Breakdown

When Anthropic launched Claude Code as a research preview in February 2025, nobody expected what happened next.

Developers started running it nonstop. Burning through tokens around the clock. Treating it like a junior engineer who never slept or asked for a raise.

Anthropic eventually had to add weekly usage limits because the tool had become, in their own words, "indispensable" for their internal teams.

That says a lot.

But here is the thing: most people using Claude Code have no real idea what is happening underneath. They type a request, stuff happens, code appears.

It feels like magic. It is not.

It is a beautifully simple architecture that, once you understand it, will completely change how you use the tool.

This is the full breakdown.

What Claude Code Actually Is

Before the technical stuff, let us be precise about what this tool is and is not.

Claude Code is not a code completion tool. It is not autocomplete on steroids. It is not a chatbot that happens to know Python.

Claude Code is an agentic coding system. That word "agentic" is doing a lot of work here, so let us unpack it properly.

A regular AI assistant responds to one prompt at a time. You ask, it answers, you paste the code, you move on. There is no continuity and no ability to take real action in the world.

An agentic system does something different. It reads your codebase, makes a plan, executes that plan across multiple files using real tools, checks whether the output is correct, and iterates until the job is done.

You define the goal. The agent handles the execution loop.

The practical difference: instead of asking "write me a function that validates email addresses," you say "write tests for the auth module, run them, and fix any failures," then walk away.

Claude Code will figure out which files to read, what tests already exist, write new ones, run them with Bash, read the error output, trace the failure back to a root cause, fix the code, and run the tests again.

That full cycle happens autonomously.

The Core Architecture: Model + Tools + A Loop

Here is the technical truth about how Claude Code works. Strip away every feature, every integration, every mode, and you are left with three things:

Model + Tools + A Loop

That is it. Everything else is built on top of this.

The loop itself is almost embarrassingly simple when written out as pseudocode:

while True:
    response = model(messages, tools)
    if response.stop_reason != "tool_use":
        return response.text
    results = execute(response.tool_calls)
    messages.append(results)

Translate that into plain English:

Claude receives your message along with a list of tools it can use. It either calls a tool to get information or take action, or it produces a final text response.

If it called a tool, the result gets fed back into the conversation and the whole process repeats.

This continues until Claude produces output with no tool call attached. At that point, the loop terminates.

Every single thing you see Claude Code do, reading files, running tests, committing to git, spawning subagents, all of it is the model requesting tool calls within this loop.

What makes it powerful is not the loop itself. It is the quality of Claude's reasoning about which tool to call next, based on the full accumulated context of everything that has happened in the session so far.

The agent loop in visual form. Claude calls a tool, gets the result, decides what to do next, repeats until done.

The Tool Inventory: What Claude Can Actually Do

The tools are where the real capability lives. Here is the full breakdown by category.

File Operations

Read, Edit, Write, MultiEdit.

These let Claude read any file in your codebase, make targeted line-level edits, overwrite files completely, or batch-edit multiple sections at once.

Read-only tools run without permission prompts by default. Edit and Write require explicit user approval.

Search and Discovery

Grep and Glob.

Grep does regex-based content search across your codebase. Glob finds files matching patterns.

Anthropic chose regex over vector database search here because Claude can write sophisticated regex patterns without the overhead of managing embeddings. A deliberate trade-off toward simplicity.

Execution

Bash.

This is the most powerful tool in the kit. Bash gives Claude the ability to run any shell command: test suites, scripts, package installs, git operations, CLI tools like gh or aws, log files, anything.

This is also why the permission system exists.

Web Access

WebSearch and WebFetch.

Claude can search the web and fetch the contents of URLs. Relevant when debugging against library docs or pulling in external context that is not in your codebase.

Orchestration

Agent, Skill, AskUserQuestion, TaskCreate, TaskUpdate.

Agent spawns a subagent in a fresh context. Skill invokes a stored workflow. AskUserQuestion pauses the loop to ask you something directly. TaskCreate and TaskUpdate manage persistent task lists.

Notebook Tools

NotebookRead and NotebookEdit for working with Jupyter notebooks.

One important thing: Claude does not need you to tell it which tool to use. It decides based on context. You just describe what you want.

Parallel Tool Execution

Something most people do not realize: read-only tools like Read, Glob, and Grep can run concurrently when multiple are requested in the same turn.

If Claude needs to read three files to understand a module, it fires all three reads simultaneously and waits for the results together.

Write operations like Edit and Bash run sequentially to prevent conflicts.

The Planning System

Claude Code has a built-in planning mechanism that prevents it from just randomly executing actions.

When you give Claude a multi-step task, it first creates a structured task list. Originally this was called TodoWrite. After Claude Code v2.1.16, it evolved into a full Tasks API: TaskCreate, TaskUpdate, TaskList, and TaskGet.

The key behavior: the current state of the task list is injected back into Claude's context after each tool use.

After every action, Claude can see what it just did, what is in progress, and what still needs to happen. This is what prevents it from losing track during long, complex sessions.

Tasks have dependency management so Claude understands which steps must complete before others can start.

They have shared state stored in ~/.claude/tasks so multiple subagents or different sessions can work on the same task list without stepping on each other.

They support context persistence so Claude can resume work on a task days later with full memory of what happened.

The Tasks API is Anthropic pushing Claude Code toward genuinely autonomous multi-day project work.

Plan Mode: Think Before You Build

Press Shift + Tab twice (or type /plan) and you enter Plan Mode.

In Plan Mode, Claude operates in a completely read-only environment. It can use Read, Grep, Glob, and WebSearch to explore your codebase. It cannot modify any files. It cannot run Bash commands. It cannot touch anything.

This exists because of a fundamental problem with AI coding tools: they tend to immediately start generating code based on limited local context, which often produces technically correct solutions that solve the wrong problem.

Plan Mode forces a better workflow:

1. Enter Plan Mode (Shift + Tab twice)
2. Describe what you want to build
3. Let Claude explore the codebase and design the approach
4. Review and refine the plan together
5. Exit into execution mode
6. Claude implements what it just planned

This mirrors how experienced engineers actually work.

Nobody immediately starts writing code when they get a new feature request. They read the existing code, understand the patterns, map out dependencies, and plan the architecture. Only then do they start building.

The practical difference in output quality is significant. Implementations that come out of a plan-then-execute workflow tend to feel intentional and cohesive.

A more advanced workflow: write plans to version-controlled files. Teams can check these spec files into git, creating an audit trail of architectural decisions that outlives individual AI sessions.

Memory: How Claude Knows What It Knows

This is where things get interesting, and where most Claude Code users are massively underutilizing the tool.

The model itself has no persistent memory. Each session starts from zero. But Claude Code has built multiple layers on top of this to solve the problem.

Layer 1: In-Context Memory

Within an active session, Claude remembers everything. Every file read, every command run, every decision made. This is the context window. It disappears when the session ends.

Layer 2: CLAUDE.md

A Markdown file you place at your project root. Claude reads it at the start of every single session.

This is where you put everything permanent: coding standards, architecture decisions, testing preferences, workflow rules, security constraints, gotchas about your specific environment.

Run /init in any project to generate a starter CLAUDE.md based on your codebase structure.

Keep CLAUDE.md short and ruthlessly pruned. A bloated CLAUDE.md causes Claude to ignore parts of it because important rules get buried in noise. If a line does not directly prevent a mistake, cut it.

You can place CLAUDE.md in multiple locations:

~/.claude/CLAUDE.md applies globally to all your projects
./CLAUDE.md in the project root gets committed to git and shared with your team
Subdirectory CLAUDE.md files get loaded on demand when Claude reads files in those folders

Layer 3: MEMORY.md

Claude Code automatically creates a memory directory for each project and builds MEMORY.md across sessions, storing learnings like build commands, debugging insights, and project-specific context it has accumulated.

This gets populated without you writing anything.

Layer 4: Skills

Stored in .claude/skills/ as SKILL.md files. Each skill describes a specific workflow or domain area.

Claude loads skill descriptions at session start but only pulls the full content of a skill when it becomes relevant to the current task. This keeps context lean.

Layer 5: External Databases

For larger knowledge retrieval needs, you can connect Claude Code to SQL databases for structured data or vector databases for semantic search.

Layer 6: MCP Servers

External services and APIs as memory sources. Covered in the next section.

The six memory layers, from session-level in-context memory all the way to external MCP-connected services.

The Context Window: The Most Important Resource

Everything in a Claude Code session lives in the context window.

The system prompt, tool definitions, every file read, every command output, the full conversation history. All of it accumulates in a single rolling buffer.

Claude's models have a 200,000 token context window. That sounds huge. It is not as huge as you think in practice.

A single debugging session exploring a complex codebase can burn through tens of thousands of tokens fast. Large file reads, verbose Bash output, long conversations, they all eat context.

LLM performance degrades as context fills. When the context window approaches capacity, Claude may start forgetting earlier instructions, making more errors, or losing track of constraints you specified at the start of the session.

This is called the "lost-in-the-middle" problem: models deprioritize information sitting in the middle of a very long context, paying more attention to the beginning and end.

Prompt Caching

Claude Code is built entirely around prompt caching. The API caches the prefix of a request, so content that does not change between turns (the system prompt, tool definitions, CLAUDE.md) only gets processed once.

Subsequent requests that share that prefix get a cache hit, dramatically reducing both cost and latency.

Claude Code structures its context in this order:

1. Static system prompt and tool definitions (globally cached)
2. CLAUDE.md content (cached within a project)
3. Session context (cached within a session)
4. Conversation messages (always fresh)

Context Compaction

When the context window approaches about 95% capacity, Claude Code automatically triggers compaction.

It takes the full conversation history and summarizes it into a concise Markdown-based project memory, preserving the most important decisions, file states, and context while discarding verbose tool output and exploration chatter.

You can trigger this manually at any time with /compact.

The important difference from /clear: compact preserves a summary of what happened. Clear wipes everything and starts from zero.

Use compact when you want continuity but need to free up space. Use clear when switching to a completely unrelated task.

MCP: The Plugin System That Expands Everything

Model Context Protocol is the mechanism that lets Claude Code connect to external systems.

Anthropic open-sourced MCP in November 2024 as a universal standard for connecting AI assistants to data sources and tools.

The architecture is straightforward: you run an MCP server (a process that exposes capabilities), and Claude Code connects to it as an MCP client.

The server exposes tools, resources, and prompts. Claude can use those as if they were native tools.

Anthropic ships pre-built MCP servers for Google Drive, Slack, GitHub, Postgres, and Puppeteer. Community-built servers exist for Figma, Jira, Notion, various databases, and hundreds of other services.

To add an MCP server: claude mcp add. To list connected servers: claude mcp list.

One important caveat: MCP servers add all their tool schemas to every request by default, which consumes context space. A few MCP servers with many tools can meaningfully bloat your context before any actual work happens.

Claude Code introduced "tool search" to address this, deferring MCP tool schemas and loading them on demand rather than upfront.

The Permission and Safety Architecture

Claude Code has filesystem access, shell execution, and network access. That is genuinely powerful and genuinely dangerous if misconfigured.

Anthropic built a layered permission model to manage this.

The Six Permission Modes

default: Claude asks before running commands, writing files, or accessing the network. Safe but friction-heavy for active development.

acceptEdits: Auto-approves file edits and common filesystem commands. Other Bash commands still require approval. Good for active coding sessions where you trust the direction.

plan: Read-only mode. Claude explores and analyzes without being able to modify anything.

auto: A separate classifier model reviews each tool call before execution. It approves routine work automatically and blocks scope escalation or commands that look like they are responding to hostile content injected into tool outputs. This is the sweet spot for autonomous operation.

dontAsk: Never prompts. Pre-approved tools run, everything else is denied. For CI pipelines and automated workflows.

bypassPermissions: Runs everything without asking. Should only be used in isolated environments like containers or sandboxes where Claude's actions cannot affect systems you care about.

Six modes from fully supervised to fully autonomous. Choose based on how much you trust the current task.

Permission Rules

Beyond modes, you can write granular allow and deny rules in .claude/settings.json.

These rules are evaluated before any tool runs. Rules can be scoped to specific commands, for example "Bash(npm *)" allows only npm commands through Bash while requiring approval for everything else.

Hooks: The Deterministic Enforcement Layer

Hooks are shell scripts that run at specific points in Claude's execution:

PreToolUse: before a tool runs
PostToolUse: after it returns
Stop: when the agent finishes
PreCompact: before context compaction

The critical difference between hooks and CLAUDE.md rules: CLAUDE.md rules are advisory. They are behavioral guidance that Claude reads and follows. Hooks are deterministic.

A PreToolUse hook that rejects a call with exit code 2 physically prevents that tool from executing, regardless of what Claude wants to do.

For security-sensitive environments, this distinction matters enormously.

If you never want Claude to read .env files, you do not rely on CLAUDE.md saying "never read .env files." You write a PreToolUse hook that intercepts any Read call targeting .env and blocks it unconditionally.

Subagents and Multi-Agent Orchestration

This is where Claude Code starts to look more like a team than a single developer.

The Single-Threaded Master Loop

By default, Claude Code runs a single-threaded master loop: one flat message list, one agent, one conversation at a time.

Anthropic made this choice deliberately. It creates a transparent, predictable audit trail where every step is visible. It prevents uncontrolled agent proliferation. It keeps the system controllable.

But single-threaded does not mean you can only do one thing at a time.

Subagents

When Claude encounters a task that benefits from parallel exploration or context isolation, it can spawn subagents.

A subagent runs in its own fresh context. It has its own system prompt and loads its own CLAUDE.md, but it does not inherit the parent session's conversation history.

This is the key benefit: the subagent can read hundreds of files, explore entire module trees, run analysis, and return only a summary to the parent.

All that exploratory noise stays contained in the subagent's context and does not pollute the main session.

Subagents are depth-limited: they cannot spawn their own subagents. This prevents uncontrolled recursive proliferation while still enabling one level of decomposition.

Agent Teams

For larger parallel workloads, Claude Code supports agent teams. One session acts as the team lead, coordinating work and assigning tasks to worker sessions.

A typical pattern: run git worktrees to give each agent an isolated branch of the codebase, then coordinate with a lead agent that assigns features and merges results.

The Tasks API supports this pattern directly: tasks stored in ~/.claude/tasks are visible to all agents working on the same project, with real-time synchronization when any session updates a task status.

Lead agent assigns tasks to parallel workers. All share the same Tasks API for synchronization.

Async Steering

One underappreciated feature: you can pause a running session with Esc, inject new instructions, redirect the approach, and resume without losing the accumulated context of what has already happened.

What would otherwise be a batch process becomes a genuinely interactive streaming conversation.

A Real Example: What Happens When You Type a Request

Let us make this concrete. You type: "Fix the failing tests in auth.ts"

Here is exactly what happens:

Turn 1: Claude sends a Bash tool call to run the test suite. The SDK executes npm test and returns the output. Three test failures with specific error messages.

Turn 2: Claude calls Read on auth.ts and auth.test.ts. It reads both files and builds context about the module structure and what the tests expect.

Turn 3: Claude analyzes the failures, identifies the root cause (the token refresh function is not handling expired tokens correctly), calls Edit to modify auth.ts, then calls Bash again to re-run npm test. All three tests pass.

Final Turn: Claude produces a text-only response with no tool call attached: "Fixed the auth bug. The issue was in the token refresh handler, which wasn't handling expired tokens correctly. All three tests pass now."

The loop terminates because the last response had no tool call.

That was four turns. Three with tool calls. One final text. This is the whole system at varying levels of complexity.

Power Features Worth Knowing

A few things that significantly change the experience once you know they exist.

The ! Prefix

Type !git status instead of asking Claude to run git status. The ! prefix executes Bash commands directly and injects the output into context, with no model processing and no extra token cost for the request itself.

claude --continue and claude --resume

Sessions are persistent. --continue picks up the most recent session. --resume shows a list of past sessions to choose from. Name sessions with /rename to find them later.

Checkpoints and Rewind

Every prompt creates a checkpoint. Press Esc + Esc or run /rewind to restore conversation, code, or both to any previous state.

This makes it safe to try risky approaches. If it does not work, rewind and try something else.

Non-Interactive Mode

claude -p "your prompt" runs Claude without a session, which is how you integrate it into CI pipelines, pre-commit hooks, or scripts.

Add --output-format json for structured output or --output-format stream-json for streaming.

/btw for Side Questions

If you want to check a detail without growing your context, /btw puts your question in a dismissible overlay. The answer never enters conversation history.

/goal for Session-Level Objectives

Set a goal as a session-wide condition. A separate evaluator re-checks it after every turn, and Claude keeps working until the goal is satisfied.

Claude Code vs The Field

Claude Code optimizes for autonomous, long-running tasks. Cursor and Copilot optimize for interactive, inline speed.

Claude Code is not trying to win at inline completions.

It is optimized for long-running, multi-step, autonomous execution that Cursor and Copilot are not really designed for. Complex refactors, debugging sessions spanning multiple files, architecture work, these are where the 200K context window and the full agentic loop earn their keep.

Where This Is Going

A few things stand out as directional signals from what Anthropic is building.

The shift from TodoWrite to the Tasks API is significant. This is not just a UX improvement. It is infrastructure for multi-day, multi-agent project work. Tasks persist across sessions, synchronize across agents, and support dependency management.

Anthropic is building toward Claude Code managing genuinely long-horizon engineering work, not just one-session coding tasks.

The Agent SDK is another signal. Anthropic released a standalone Python and TypeScript SDK that lets you embed the full Claude Code agent loop in your own applications. This is infrastructure for companies building AI-powered development pipelines, automated code review systems, or custom coding agents.

The hosted infrastructure (Background Agents, Cloud Sessions) means Claude Code can now run on Anthropic's servers rather than your local machine. Kick off a task from your phone, let it run overnight, review the PR in the morning.

The direction is clear: from tool to autonomous development partner.

Useful Resources

These are worth bookmarking if you want to go deeper:

Official Claude Code Docs — Well-maintained and regularly updated
How the Agent Loop Works (Official) — The actual SDK documentation
Best Practices from Anthropic Engineering — Written by the team that built it.

If this broke down something you have been using without fully understanding, share it with someone else in the same boat.