Estimated reading time: 60-75 minutes
This guide is Claude-centric — but Codex is a rising luminary in the CLI sky. As of early 2026, OpenAI Codex is emerging as an equally capable and often superior companion to Claude Code for many implementation tasks. This site documents Claude Code because that is where deep experience lives, but the practices here apply broadly. Before you commit fully to one tool, try both on a real task. The winner might surprise you. See the Other AI CLIs comparison →
Chapter 1

Introduction: What Is a CLI and Why Would Anyone Use One?

Wait — Which "Claude" Are We Talking About?

Before anything else: if you've ever sat in a meeting while someone threw out "Claude Code" and you nodded along while privately wondering what exactly they meant — you're not alone, and it's not your fault. Anthropic has done a poor job of naming things distinctly.

Claude actually ships in several different surfaces. Here are the main ones:

  • Claude.ai — The website. You open a browser, you chat. This is what most people mean when they say "I use Claude." The consumer product. Chat only.
  • Claude Desktop — Chat tab — The Windows/Mac installed app with the Chat tab selected. Looks and works like the website. Can be extended with MCP servers. Still primarily a chat interface.
  • Claude Desktop — Code tab ⬅ This IS what this guide is about — Inside Claude Desktop, click the Code tab at the top. This switches you into Claude Code mode — full file system access, bash execution, MCP servers, skills, slash commands, CLAUDE.md, everything. It is the same Claude Code CLI engine, embedded inside the desktop app with a GUI wrapper. If your user is using the Code tab in Claude Desktop, they are already using what this guide describes. They just didn't know the name for it.

    How to start a Code tab session (Windows):
    1. Git required. The Code tab needs Git for Windows installed. Download from git-scm.com and restart the app after installing.
    2. Click the Code tab at the top of Claude Desktop. If it asks you to upgrade, you need a paid plan.
    3. Choose where Claude runs: Select Local to use your own machine and files. (Remote = Anthropic-hosted cloud session. SSH = a remote machine you manage.)
    4. Select your folder. Click Select folder and choose your project directory — the folder containing the code you want to work with. This is the equivalent of cd C:\repos\MyProject in the terminal.
    5. Pick your model from the dropdown. Note: you cannot change the model after the session starts.
    6. Start typing. Claude now has access to that folder's files. By default it will ask permission before making changes (Ask permissions mode), showing a diff with Accept/Reject buttons.
  • Claude Code (terminal CLI) — The standalone command-line version. You open a terminal, type claude. Same capabilities as the Code tab in Claude Desktop — just without the GUI wrapper. Some developers prefer this; others prefer the desktop app. Both are covered by this guide.
  • Claude in VS Code / JetBrains — IDE-native integrations built on the same Claude Code engine. Most concepts in this guide apply here too.
  • Claude in Slack — Chat-oriented, not code-oriented. Not what this guide is about.
  • Claude in CI/CD pipelines — Claude Code running headlessly in automated pipelines. Advanced usage; same engine, no interactive UI.

When a developer says "I use Claude Code," they might mean the Code tab in Claude Desktop OR the terminal CLI — both are the same thing under the hood. When a manager says "we use Claude," they probably mean the website. They are different surfaces, but the Code tab and the terminal CLI are the same product.

It is genuinely embarrassing to be in a technical conversation where everyone assumes you know which surface they mean. You're not slow. The naming is just bad. The short answer: if you see a Code tab, you are in Claude Code. If you are in a terminal typing claude, you are in Claude Code. This guide covers both.

Who This Is For

Developers who use AI every day but feel like they're only using 20% of the tool. You know how to prompt. You get good results. But skills, hooks, agents, MCP servers, memory — these words float past and nothing sticks. This guide makes them stick.

Windows programmers who grew up with Visual Studio, property dialogs, and point-and-click configuration. The CLI world feels foreign. This guide translates every concept into terms you already know.

Non-programmers who use AI for work. If you use Claude, ChatGPT, or Copilot to write reports, analyze data, generate content, or automate tasks — and you've heard that the command-line versions are more powerful but don't understand why — this guide is for you too. You don't need to be a programmer to benefit from understanding how these tools are architected. The concepts (sessions, memory, plugins, permissions) apply whether you're writing code or writing proposals.

Team leads and managers evaluating AI coding tools for their organization. Understanding the architecture helps you make informed decisions about which tools to adopt and how to configure them for your team.

What Is a CLI?

A CLI (Command Line Interface) is a text-based way to interact with a program. You type commands, it types back. No buttons, no menus, no mouse. Just a blinking cursor in a terminal window.

If that sounds primitive, consider this: every major AI company has built a CLI version of their AI tool. Anthropic built Claude Code. OpenAI built Codex CLI. Google built Gemini CLI. GitHub built Copilot CLI. They didn't do this for nostalgia. They did it because the CLI can do things a browser window cannot.

The Browser/GUI AI Experience

ChatGPT, Claude.ai, Copilot Chat in VS Code, Claude Desktop — these are GUI-based AI tools. You type in a chat box and get a response. Out of the box, the experience is conversational. Some desktop apps (like Claude Desktop) can be extended with MCP servers to access your filesystem and external tools, but this requires manual configuration. Without that setup:

  • The AI can't see your files unless you paste them in (or configure an MCP server)
  • The AI can't edit your files — you copy its output and paste it yourself (unless extended with MCP)
  • The AI can't run your code, your tests, or your build tools (without MCP)
  • The AI can't fix its own mistakes in a loop — you relay error messages back and forth
  • You are the middleman unless you invest in MCP configuration

Desktop apps like Claude Desktop and VS Code with Copilot can bridge some of these gaps via MCP servers and extensions. But the CLI tools come with all of this built in, out of the box, with no configuration required.

The CLI AI Experience

Claude Code, Codex CLI, Copilot CLI, Gemini CLI — these run inside your project directory. They have direct access to your files, your tools, and your environment:

  • Read your files directly — no copy-pasting code into a chat window
  • Edit your files directly — changes appear in your editor immediately
  • Run commands — build, test, lint, deploy, all from within the conversation
  • See error output and fix it in a loop — without you relaying messages
  • Navigate your entire project — search code, read configs, understand the full codebase
  • Remember across sessions — CLAUDE.md, memory files, and skills persist your preferences and project knowledge

The AI isn't looking at a snapshot you pasted. It's in your project, with full access to your files and tools. The difference is not incremental. It is a fundamentally different way of working.

For developers: A well-configured Claude Desktop with MCP servers is like a debugger where you had to manually wire up every watch variable, breakpoint type, and output window yourself before you could start. The capability is there once you've done the setup. The CLI is like Visual Studio's debugger — you attach it, and everything works immediately: breakpoints, watches, call stack, live variable inspection, step-through. Same underlying power, but one requires significant configuration before first use and the other works out of the box.
For everyone: A desktop AI with MCP servers configured is like a mechanic who brought a full toolbox to your garage — but you had to call ahead, give them a list of exactly which tools to bring, and meet them at the door to let them in. The CLI mechanic has a master key, knows the layout of your garage, and brought every tool automatically. If you've already done the MCP setup work, great — you have a capable mechanic. But most people haven't, and the CLI assumes you haven't.

The Case for CLI: Why It's Worth the Learning Curve

If the browser version works for you, why bother with the CLI? Here's the honest case:

CapabilityBrowser / Desktop AICLI AI (Claude Code, Codex CLI)
Ask questions, get answersYesYes
Generate code snippetsYesYes
Read your actual project filesWith MCP setupYes (built in)
Edit your files directlyWith MCP setupYes (built in)
Run build/test commandsWith MCP setupYes (built in)
Fix errors in a loopPartial (manual relay)Yes (automatic)
Persistent project instructionsNoYes (CLAUDE.md)
Plugins / external toolsYes (MCP servers)Yes (MCP servers)
Skills, slash commandsNoYes
Automation hooksNoYes (hooks)
Cross-session memoryLimitedYes (memory files)
Resume previous conversationsYesYes
Works from your phoneYes (native)Yes (Remote Control)
Corporate IT friction to get startedHigh (MCP servers may need IT approval, admin rights, security review)Low (one npm install, runs as your user account)

The browser, desktop, and IDE-integrated versions are genuinely capable tools — especially for conversational work, quick questions, and document drafting. The CLI's advantage isn't that GUI surfaces are crippled; it's that the CLI's default affordances are built for coding. File access, command execution, project awareness, skills, hooks, memory, and CLAUDE.md all work out of the box with zero configuration. On a GUI surface, most of those require deliberate setup. The real distinction is workflow shape: if your work is primarily conversational, a chat interface is great. If your work is primarily project-based (read files, run tests, make changes, verify), the CLI is shaped for that from the start. Note also that Claude Code now ships in VS Code and JetBrains IDE integrations — those bring many CLI capabilities into a GUI context, blurring the line further.

In corporate IT environments, MCP servers may not be self-service. Installing an MCP server can require: admin rights to install npm or Python packages, IT approval to run background processes, security review of what data the server can access, firewall rule changes for network-connected MCP servers, and Group Policy exceptions on managed Windows machines. This is real friction — potentially an IT ticket, a security review cycle, and weeks of waiting. The CLI often reduces setup friction because its built-in file access runs under your own user account with no extra processes to approve. However, enterprise environments can still add friction even for the CLI itself: corporate proxies, custom certificate authorities, firewall allowlists, and restricted networks can all require IT involvement regardless of which surface you use. The point isn't that CLI bypasses IT entirely — it's that the CLI's core capabilities require fewer moving parts to get working in the first place. That difference matters when you're trying to start today instead of Q2.

Why Not Both?

You can use both. Many people use Claude.ai or ChatGPT for quick questions, brainstorming, and research, and switch to Claude Code CLI when they need hands-on work done in their project. The browser is a notepad. The CLI is a power tool. This guide is about the power tool.

How to Open a CLI

On Windows: open Windows Terminal, PowerShell, or Command Prompt. On Mac/Linux: open Terminal. That black (or dark) window with the blinking cursor is your CLI. Type claude and press Enter to start Claude Code (assuming it's installed via npm install -g @anthropic-ai/claude-code).

Everything in this guide happens in that window.

Installing Claude Code

There are three ways to install Claude Code on Windows. Use whichever fits your environment:

MethodCommandWhen to use it
Native installer (recommended)Download from claude.ai/downloadCleanest install. No Node.js required. Preferred for most users.
WinGetwinget install Anthropic.ClaudeCodeGood for managed Windows environments or scripted provisioning.
npm (compatibility path)npm install -g @anthropic-ai/claude-codeUse if Node.js is already your primary toolchain, or if your environment requires npm-managed installs.

After installing, run claude --version in a terminal to confirm. Then run claude to start your first session. You'll be prompted to log in with your Anthropic account.

Documentation online (including older versions of this guide) often shows only the npm path. If you're on Windows and don't already use Node.js, the native installer or WinGet is simpler and avoids Node version conflicts.
Chapter 2

The Session

A session in Claude Code is a single, continuous conversation between you and Claude. It has a beginning (when you start it) and it persists on disk (so you can resume it later). Every message you send, every file Claude reads, every command it runs — all of it belongs to a session and is recorded in that session's log.

Like a VS debugger session with watch variables that persist. When you attach the debugger, you have a live session with state: breakpoints, watch expressions, call stack, variable values. Close the debugger, and the session ends. But if you saved the debug configuration, you can reattach and resume where you left off. A Claude session is the same: live conversation state that persists to disk.
Or think of it like an open document in Visual Studio with undo history. Every action is recorded. You can close the document and reopen it — the undo history (conversation history) is still there. The session IS the document.
Like a project folder on your desktop with all your notes, drafts, and emails for one client. When you open that folder, everything from your last working session is still there — the draft proposal, the notes from the last call, the email thread. You pick up exactly where you left off. A Claude session is that folder. Close it and reopen it next week — everything is still there.

Where Sessions Live

Session data is stored locally at C:\Users\YourName\.claude\projects\, organized by project path. The path encoding replaces backslashes with double-dashes: C:\repos\MyApp becomes C--repos-MyApp. Inside each project directory, session logs are stored as .jsonl (JSON Lines) files — one JSON object per line, each line representing a conversation turn.

Why Sessions Matter

Sessions are your working context. A session carries:

  • Full conversation history (what you said, what Claude said, what tools were used)
  • The loaded CLAUDE.md instructions (from the repo you're working in)
  • Any skills that were activated during the conversation
  • The accumulated understanding of what you're working on

Starting a new session means starting from scratch — a fresh context window with no memory of previous conversations (unless you use memory files, covered in Chapter IX).

If you're coming from a GUI mindset, sessions feel invisible. There's no "File > Save Session" dialog. Sessions are created implicitly when you start a conversation and persist automatically. This is the Unix way: the tool manages its own state in files, transparently. You don't save your bash history manually either — it just happens.

Forking a Session

You can fork a session — create a copy of it at a point in time and continue in a new direction, leaving the original intact. This is exactly like a git branch: the original session continues to exist unchanged, and the fork becomes its own independent session from that point forward.

Use forking when:

  • You want to try a risky approach without losing your current working state
  • You reached a decision point and want to explore two different solutions simultaneously
  • You want to hand off a copy of your session context to someone else to continue

Fork a session with: claude --fork-session <session-id> from the terminal, or via a session manager tool.

Exactly like git branch. You have a main branch (original session). You create a feature branch (fork). Both exist independently. Changes on the fork don't affect the original. If the fork works out, great. If not, you still have the original.
Like making a copy of a document before trying a radical redesign. The original is safe. The copy is where you experiment. If the experiment works, you keep it. If not, you still have the original.

Fork vs. Agent — What's the Difference?

Both involve "splitting off" from your current session. They're easy to confuse:

Fork (session fork)Agent (subagent)
What it isA copy of your session that becomes a new independent sessionA temporary Claude instance spawned for one specific task
DurationPermanent — the fork persists like any sessionTemporary — terminates when the task is done
ContextStarts with a copy of your full conversation historyStarts fresh — gets only the task brief you give it
You interact with itYes — it becomes your new active sessionNo — it works independently and reports back
Analogygit branchsubprocess / child process
Use whenYou want to explore a direction without losing your current stateYou want Claude to do a subtask in the background while you keep working
Chapter 3

CLAUDE.md — The Constitution

You do not write or edit this file yourself. Claude writes it. You just tell Claude what to put in it. This is one of the most important things to understand about Claude Code — you are the manager, Claude is the one doing the typing.

Step One: Run /init

When you start a new project, type /init in Claude Code. That's it. Claude will analyze your codebase — the languages, frameworks, directory structure, and patterns it finds — and create an initial CLAUDE.md automatically. You don't write it from scratch. You review what Claude produced and say "looks good" or "also add X." You can also run /init on an existing project to improve an existing CLAUDE.md — it will suggest additions based on what it now knows about the codebase.

What Is CLAUDE.md?

CLAUDE.md is a markdown file that contains instructions for Claude. When you start a session in a repository that has a CLAUDE.md at its root, that file is automatically loaded into Claude's context. Every message Claude generates in that session is influenced by those instructions.

It is not executed. It is not parsed by a compiler. It is read — loaded into the context window where Claude can see it, the way a brief handed to a consultant is read before a meeting starts, not run like a program.

Like .editorconfig or .eslintrc. You don't run these files. Your editor reads them and adjusts its behavior: tab size, line endings, indentation style. CLAUDE.md does the same thing for Claude: "Use camelCase. Never modify files in /config/production/. Always write tests. Our API uses JWT authentication." Claude reads it and adjusts its behavior accordingly.
Or think of it like the XML comments at the top of a .csproj file that configure MSBuild behavior. The comments don't execute — MSBuild reads them to decide how to build. CLAUDE.md is read by Claude to decide how to code.
Like the onboarding document you give a new employee or contractor. "Here's how we do things: we use this email format, we cc the PM on all client communications, we never promise a delivery date without checking with engineering first." You write it once. Every new person reads it on day one. CLAUDE.md is that onboarding document — for AI. And just like with a new employee, Claude wrote the first draft based on what it observed about your project. You reviewed it and approved it.

When to Tell Claude to Update It

Claude does not update CLAUDE.md automatically. It reads it every session but only writes to it when you ask. If you don't ask, hard-won knowledge from a session is lost next time.

Tell Claude to update CLAUDE.md:

  • After a painful debugging session: "We just spent two hours figuring out that the payment service returns null on weekends. Add that to CLAUDE.md so you never forget it."
  • When you establish a new rule: "From now on, all API responses must be wrapped in our Result type. Add that to CLAUDE.md."
  • When you discover a forbidden pattern: "We found out the legacy/ directory has circular dependencies. Add a rule to never import from there."
  • At the end of a major feature: "We just finished the auth refactor. Update CLAUDE.md with the new session token format and how the refresh flow works."
  • When something surprised you: "I didn't expect that. Add a note to CLAUDE.md about this so it doesn't surprise you next session either."

The simplest habit: at the end of any session where you learned something important, say: "Update CLAUDE.md with anything we learned today that future sessions should know." Claude will handle the rest.

CLAUDE.md does NOT auto-update. If you end a session without telling Claude to record what was learned, that knowledge is gone next session. Claude will start fresh and may make the same mistakes again. This is the most common frustration with Claude Code, and the fix is one sentence at the end of your session.

What Belongs in CLAUDE.md

  • Architecture overview: "This is a React/Node monorepo. The API is in /server, the UI is in /client."
  • Coding standards: "Use TypeScript strict mode. Error handling uses our Result<T> pattern."
  • Forbidden patterns: "Never use any type. Never import from the legacy/ directory directly."
  • Project-specific knowledge: "The auth module talks to Redis, not a SQL database."
  • Build/test commands: "Run tests with npm test. Build with npm run build."

What Does NOT Belong in CLAUDE.md

  • Secrets or credentials — CLAUDE.md is checked into git. Never put API keys, passwords, or tokens here.
  • Detailed workflow procedures (30+ steps) — Use skills for these. CLAUDE.md is always loaded and costs tokens every session.
  • Personal preferences — These go in your personal C:\Users\YourName\.claude\CLAUDE.md (user-level, not repo-level).

Cascading CLAUDE.md Files

You can have multiple CLAUDE.md files, and they cascade (merge):

  • C:\Users\YourName\.claude\CLAUDE.md — Personal, applies to all repos everywhere (your global defaults)
  • CLAUDE.md at repo root — Project-level, shared with team via git
  • CLAUDE.md in subdirectories — Directory-specific overrides

All applicable files are loaded and merged. It's like CSS cascading: general rules at the top, more specific rules override as you go deeper.

All CLAUDE.md files in the hierarchy are loaded together — and you pay token costs for all of them combined. Your global CLAUDE.md (C:\Users\YourName\.claude\CLAUDE.md) loads on every session across every project. Your repo CLAUDE.md loads whenever you're in that repo. They stack. If your global file is 300 lines and your repo file is 400 lines, you are paying for 700 lines of tokens on every single message. Subdirectory CLAUDE.md files load on demand (only when you work in that subdirectory), so they don't pile on upfront — but the global and repo root files always do. A bloated global CLAUDE.md is the most expensive kind: it costs tokens in every project, not just one.
The hardest thing for GUI developers to accept: CLAUDE.md is always "on." There's no checkbox to enable it, no menu to activate it. If the file exists at the repo root, it's loaded. Every session. Automatically. This passivity is its power — you can't forget to apply your standards because you never had to apply them manually in the first place.
CLAUDE.md grows over time — and you pay input token costs on every prompt for every line in it. Each time you ask Claude to add a new rule, hard-won fact, or anti-pattern to CLAUDE.md, the file gets longer. That file is loaded into context at the start of every session and charged as input tokens on every single message you send — forever, until you trim it. A 500-line CLAUDE.md might cost 3,000–5,000 tokens per message. On Sonnet at $3/million input tokens, that's fractions of a cent each — but it adds up across thousands of messages and permanently eats into your Awareness window. Treat CLAUDE.md like code: refactor it periodically, remove rules that are no longer relevant, and move detailed procedures to skills where they only cost tokens when invoked.

CLAUDE.md Size Directly Affects Your Token Budget — Every Session

This is important and often missed: every token in CLAUDE.md is consumed from your Awareness budget on every single message, for the entire session. It never leaves context. A 500-line CLAUDE.md might cost 3,000–5,000 tokens — permanently occupying that slice of your 267-page (Pro) or 1,333-page (Max) window, every time you start typing.

Compare that to a skill: a 500-line skill document costs zero tokens until you invoke it. Then it costs tokens only for that session. Next session, zero again unless you invoke it again.

CLAUDE.mdA Skill
When loadedEvery session, automaticallyOnly when you invoke it
Token costEvery message, all session, foreverOnly during sessions where you invoke it
500 lines =~3,000–5,000 tokens permanently consumed per session~3,000–5,000 tokens only when needed
Best forShort, universal rules that apply to everythingDetailed procedures for specific tasks

The rule: Keep CLAUDE.md short and universal. If it's longer than one page, ask whether any of it could move to a skill. Everything in CLAUDE.md is paying rent on your context window 24/7. Everything in a skill is free until called.

Rules Files — CLAUDE.md Split by Topic

If your CLAUDE.md is growing unwieldy, rules files offer a way to break it up without losing any coverage. Drop .md files into .claude/rules/ inside your project (or ~/.claude/rules/ for personal rules that apply everywhere), and Claude will load them alongside your CLAUDE.md.

The practical difference from CLAUDE.md is organization, not behavior. Instead of one long file covering testing, security, style, and architecture all together, you might have:

  • .claude/rules/testing.md — your test standards and required coverage rules
  • .claude/rules/security.md — forbidden patterns, input validation requirements
  • .claude/rules/style.md — naming conventions, formatting rules

Rules files are discovered recursively, so you can organize them into subfolders. They stack on top of CLAUDE.md rather than replacing it. Project rules (.claude/rules/) can be committed to git and shared with the team the same way CLAUDE.md can.

Rules files do not solve the token cost problem. Every rules file that loads still costs tokens on every message, just like CLAUDE.md. Splitting one long CLAUDE.md into several rules files keeps your repo tidy, but the total token cost is the same. If you want to reduce token overhead, move content to skills — not to rules files.
CLAUDE.md is not unique to Claude. Codex uses a file called AGENT.md in the same way — loaded at session start, same purpose, same format. If you have a well-crafted CLAUDE.md, you can copy it to AGENT.md and Codex will pick it up. Your architecture decisions, coding rules, and anti-patterns travel with you to a different AI tool with zero extra work.

Architectural Work: Mandatory Constraint Extraction

For any architectural task involving design docs, refactors, or authority changes, a critical gate exists: constraint extraction must happen before code is written.

The process: You write constraint extraction to a file (.constraint-extraction.md), the human reviews it for completeness, explicitly approves it, and only then does code writing begin. If code is attempted without approved constraint extraction, the commit is rejected.

This is not behavioral self-regulation. It is a structural gate: the human sees the file, reviews it, confirms it is correct, and blocks the next step until it is. This prevents architectural decisions from being embedded in code and discovered in review. Constraints are extracted and approved first.

Why this matters: Architectural work fails when constraints are implicit. Making them explicit before code forces clarity. Code written from unclear constraints requires rework. Code written from approved constraints rarely does. This single gate eliminates an entire class of architectural rework.

Need a ready-made CLAUDE.md? The Case Studies section has a 13-rule architectural template derived from a real production project, with a wizard that generates a customized file for your project in seconds. Works for Codex too — it produces AGENTS.md. Generate Your Day-1 File →
Prevent Code Drift — CLAUDE.md rules are only as good as the comments that reinforce them. Two practical resources:
Chapter 4

The .claude Directory — Two Very Different Locations

Stop. Read this first. The documentation for Claude Code uses Unix path notation everywhere, which causes real confusion on Windows. This chapter decodes it before anything else.

What "~" Means (Unix Notation)

In Unix/Mac/Linux, ~ is shorthand for "your home directory." On Windows, your home directory is C:\Users\YourName\. So when you see ~/.claude/ in documentation, it means:

What docs sayWhat it means on Windows
~/.claude/C:\Users\YourName\.claude\
~/.claude/settings.jsonC:\Users\YourName\.claude\settings.json
~/.claude/projects/C:\Users\YourName\.claude\projects\
~/.claude/CLAUDE.mdC:\Users\YourName\.claude\CLAUDE.md

To find this in Windows Explorer: open File Explorer, type %USERPROFILE%\.claude in the address bar and press Enter. That's the folder.

There Are TWO .claude Directories — They Are Completely Different

This is the source of most confusion. There is a .claude folder in your Windows user profile and there can be a .claude folder inside each of your project repositories. They are separate and serve different purposes:

User-LevelRepo-Level
LocationC:\Users\YourName\.claude\C:\repos\MyProject\.claude\
Unix notation~/.claude/.claude/ (no tilde)
Who it applies toYou, on every projectEveryone working on this project
In git?Never (it's outside the repo)Can be (you choose what to commit)
You can find it inFile Explorer: %USERPROFILE%\.claudeFile Explorer: inside your project folder

Quick rule: If you see ~/.claude/ (with the tilde), it's your personal folder in C:\Users\YourName\. If you see .claude/ (no tilde, no path prefix), it's a folder inside the current repo.

Why the Dot?

On Unix/Linux/macOS, a leading dot makes a directory hidden from normal file listings. Windows doesn't follow this convention automatically, so .claude folders will appear in File Explorer (unlike on Mac/Linux where you need to show hidden files). The dot is just a naming convention borrowed from Unix meaning "this is config/plumbing, not content." You'll also see .git/, .vscode/, .vs/ following the same pattern.

Like %APPDATA% (user-level, global) vs .vscode/ inside a repo (project-level). Your global VS Code settings live in AppData and follow you everywhere. A .vscode/settings.json in a specific repo only applies to that project. Same split here.
Like the filing cabinet in the back office that holds company procedures (your personal C:\Users\YourName\.claude\) versus the project binder that sits on the conference room table for a specific client engagement (the .claude\ inside your project folder). One is always yours. The other belongs to the project.

What Lives in Your Personal Folder (C:\Users\YourName\.claude\)

settings.jsonYour personal preferences (allowed tools, permissions, model defaults)
CLAUDE.mdInstructions that apply to ALL your projects everywhere
commands\Your personal slash commands, available in every repo
skills\Your personal skill documents, available in every repo
projects\Session data, memory files, per-project notes (auto-managed by Claude)

What Lives in a Repo's Folder (C:\repos\MyProject\.claude\)

commands\Project-specific slash commands (can be committed to git, shared with team)
skills\Project-specific skills (can be committed to git, shared with team)
settings.jsonProject settings shared with the team via git
settings.local.jsonYour personal overrides for this project — add to .gitignore, never commit

How to See These Folders on Windows

  • Your personal .claude folder: Open File Explorer, type %USERPROFILE%\.claude in the address bar
  • A repo's .claude folder: Navigate to your project folder in File Explorer — the .claude subfolder will be visible if it exists
  • Session data for a specific project: %USERPROFILE%\.claude\projects\ — each subfolder is a project, named using the project path with backslashes replaced by double-dashes (e.g., C:\repos\MyApp becomes C--repos-MyApp)
Chapter 5

Skills (SKILL.md Files)

A skill is a markdown document that reminds Claude how to do something — which it will remember until the next /compact (or the session ends). It does not execute. It does not run as a process. It is loaded into Claude's context window — added to the conversation as instructions — and Claude follows those instructions for as long as the content remains in context.

This is the concept most people get wrong. Skills are not plugins. They are not scripts. They are documents that inform behavior.

Like a runbook or SOP (Standard Operating Procedure) that you give to a new hire. "When deploying to production, follow these 12 steps. When writing a database migration, check these 5 things first." The new hire reads the document, understands the procedure, and follows it. The document didn't "execute" — the person read it and acted accordingly. Skills work the same way.
Like Visual Studio .snippet files or code templates. They're not running code — they're patterns that the tool knows about and applies when relevant. Or like the IntelliSense XML documentation: it informs behavior (the IDE shows you parameter info) without executing anything.
Like a job aid or quick reference card pinned above a desk. A call center agent has a laminated card that says "if the customer says X, respond with Y." It doesn't do anything on its own — the agent reads it and adjusts their response. A UX designer has a style guide they consult when making design decisions. A skill is that reference card — Claude reads it before acting, not instead of acting.

Where Skills Live

There are three kinds of skills, and they live in three different places:

  • Personal skillsC:\Users\YourName\.claude\skills\ — available in every project on your machine, never committed to git. These are yours. You write them, you own them.
  • Project skillsC:\repos\MyProject\.claude\skills\ — specific to one repo, can be committed to git and shared with the team.
  • Plugin skillsC:\Users\YourName\.claude\plugins\cache\[plugin-name]\[plugin-name]\[version]\skills\ — bundled inside an installed plugin. You didn't write them; the plugin author did. They arrive automatically when you run /plugin install.
“Skill” means a single instruction document you write; “plugin” means an installed package that can bundle many skills together; “plugin skill” means a skill that arrived via a plugin rather than one you wrote — three terms for things that look identical on disk but come from completely different places.
Plugin skills are the same SKILL.md files as personal skills. Exact same format, exact same structure — a folder with a SKILL.md file inside it. The only difference is where the folder lives on disk. A plugin skill for create-plans looks identical to a personal skill for create-plans. Claude Code finds both by walking known skill directories.

Which Skill Wins? The Precedence Order

If you have a personal skill and a plugin skill with the same name, Claude Code follows this precedence — highest priority first:

  1. Personal skills (C:\Users\YourName\.claude\skills\) — always win
  2. Project skills (C:\repos\MyProject\.claude\skills\) — win over plugins
  3. Plugin skills — used only if no personal or project skill has the same name

This means you can override any plugin skill by creating a personal skill with the same name. You get the plugin's behavior by default, and your version when you want it — without forking or modifying the plugin itself.

Plugin skills are installed, not written. You don't create them — you get them as part of a plugin package. When you run /plugin install taches-cc-resources, Claude Code downloads the plugin to C:\Users\YourName\.claude\plugins\cache\ and all the plugin's skills become available automatically. If the plugin is updated, its skills update. If the plugin is uninstalled, its skills disappear. Your personal and project skills are unaffected either way.

How Skills Get Loaded — You Are Always the Trigger

Skills do not load automatically. Claude does not scan your skills directory before every response and decide which ones apply. There is no background skill-matching. Skills are dormant — they sit in the folder doing nothing until you cause them to load.

There are two ways to load a skill, and both require you:

  1. Slash command: You type /my-skill-name and Claude reads that skill's .md file into context. This is the cleanest mechanism — explicit, deliberate, one keystroke.
  2. Direct mention: You tell Claude to use it — "follow the code-review skill" or "use the deployment runbook." Claude, knowing skills exist in .claude/skills/, reads the relevant file. You are still the trigger; you're just using words instead of a slash command.

If you want a skill to always be active, put its content in CLAUDE.md instead — that's what always-loaded means.

Skill vs. CLAUDE.md

Both are "documents loaded into context." The difference is when they load:

  • CLAUDE.md — loaded automatically at the start of every session, every message. You never have to ask. It is always there consuming context window space.
  • Skills — loaded only when you invoke them (slash command or explicit mention). They cost zero context window tokens until you need them.

This matters for context window management. A 500-line CLAUDE.md costs tokens on every message. A 500-line skill only costs tokens when it's actually needed.

The word "skill" is misleading for Windows developers. In the Windows world, a "skill" sounds like an Alexa skill or a microservice — something that runs. In Claude Code, a skill is a document. It does nothing on its own — it waits to be invoked. Rename it mentally to "instruction document" or "playbook" if that helps.
Chapter 6

Slash Commands

Slash commands are pre-canned prompts. You write a prompt once, save it as a file, give it a name. When you type /foo, Claude reads the file foo.md and sends its contents exactly as if you had typed that entire prompt yourself. That's it. No magic, no runtime, no API call — just a saved prompt with a short name.

Like a stored procedure in SQL. You define it once with a name and a body. When you call it by name, the body executes. Slash commands are the same: the name is the filename, the body is the file contents, and "execution" means "the contents are sent as a prompt." Or like PowerShell aliases: you type a short name, it expands to the full command.
Like Visual Studio External Tools (Tools > External Tools). You define a name, a command, and arguments. Invoking it by name runs the command. Slash commands are external tools for Claude — named, reusable prompt templates.
Like a canned email response saved in your email client. You've typed "Here are the next steps after our kickoff meeting..." twenty times, so you saved it as a template called "Post-kickoff." Now you just click the template name and it fills in. Slash commands are those saved templates — for instructions to AI instead of emails to clients. A project manager might create /sprint-review that automatically loads the right prompts for running a sprint retrospective.

Where Slash Commands Live

  • Personal commandsC:\Users\YourName\.claude\commands\ — available in all repos everywhere (like PowerShell profile functions)
  • Project commandsC:\repos\MyProject\.claude\commands\ — shareable via git (like .vscode\tasks.json)

What a Slash Command File Looks Like

It's just a markdown file. For example, .claude/commands/new-component.md:

Create a new React component at the path I specify.
Include:
- A functional component with TypeScript props interface
- A test file using React Testing Library
- A Storybook story file
Follow our naming convention: PascalCase for components, camelCase for hooks.
Ask me for the component name and path before starting.

Then you type /new-component and this entire prompt is sent to Claude.

Slash Commands vs. Skills — A Note on Direction

Conceptually: slash commands are prompts you invoke. Skills are instruction documents that sit dormant until you invoke them. A slash command fires a prompt ("Do this thing now"). A skill, once invoked, loads its content into the context window — it becomes part of the conversation, like any other text. It stays there, informing Claude's behavior, until the context window compacts it away, you start a new session, or you explicitly run /compact. It does not re-load itself. It does not persist into the next session. Invoke it again next session if you need it again.

Current state as of early 2026: Custom commands (.claude/commands/*.md) still work exactly as described here. Skills are the broader umbrella concept that commands now live under. If you see documentation using "skills" to cover both, that's accurate — they share the same underlying mechanism. The distinction between "a command you invoke" and "a skill that loads contextually" is still useful for understanding behavior, but treat it as conceptual rather than a hard product boundary. Check current Anthropic docs if the exact file locations ever change.
Chapter 7

Hooks

Hooks are the one thing in this architecture that actually executes code. A hook is a command or program (bash script, PowerShell script, Python script, compiled binary — anything your OS can run) that fires in response to a specific event. When the event fires, it runs on your machine, with your permissions, in your environment.

Every other concept in this document — CLAUDE.md, skills, memory, slash commands — is a document that gets loaded and read. None of them execute code on their own. Hooks are the exception: they are active executors.

Like git hooks (pre-commit, post-merge, pre-push). You drop a script into .git/hooks/pre-commit, and git runs it before every commit. If it exits non-zero, the commit is aborted. Claude Code hooks work the same way: you configure a command to run on an event, and the system runs it. Real command, real execution, real consequences.
Like MSBuild pre-build and post-build events in Visual Studio. You specify a command in project properties, and Visual Studio runs it before or after the build. Or like Windows Task Scheduler triggers: "When event X happens, run command Y." Hooks are event-driven automation within Claude Code.
Like an out-of-office rule in Outlook. You don't manually reply to every email that arrives while you're on vacation — you set up a rule once: "When an email arrives, send this reply automatically." Hooks are the same idea: "When Claude finishes editing a file, run the linter automatically." You set it up once; it fires every time without you having to remember. HR professionals know this as a workflow trigger in an HRIS system — when an employee status changes, automatically send the onboarding checklist.

Hook Events

  • Before tool use: Runs before Claude uses a tool (e.g., before editing a file). Can block the action.
  • After tool use: Runs after a tool completes (e.g., after a file edit, run tests).
  • Session start: Runs when a new session begins.
  • Notification: Runs when Claude sends a notification.

Why Hooks Matter

CLAUDE.md is advisory: "Please run tests after editing." Claude usually follows it, but it's not guaranteed. A hook is enforcement: the test suite runs whether Claude remembers or not. It's the difference between a code review comment and a CI gate.

Hooks run with YOUR permissions. A misconfigured hook can delete files, modify your system, or run arbitrary code. Treat hook configuration with the same care you'd treat a CI pipeline script or a post-build event. Review what they do before enabling them.
Chapter 8

Agents (Subagents)

When Claude needs to do a subtask that requires focused work — like searching a large codebase, analyzing a complex file, or doing research — it can spawn a subagent. A subagent is a separate Claude instance with its own context window. It receives a task description, does the work, and returns a result. Then it terminates.

Agents look like temporary forks — but they're not. A session fork copies your full conversation history. An agent starts completely fresh, getting only the task brief you give it. Think of an agent as spawning a subprocess, not branching. The subprocess has its own process space, does its work, and reports back. Your main process never saw its internal state. See the comparison table in the Sessions chapter for a side-by-side breakdown.

Like spawning a child process. The parent process (your conversation) creates a child process (the subagent) with its own address space (context window). The child does work, returns a result via its exit code or stdout, and terminates. The parent never sees the child's internal state — just the result. The child doesn't persist after the parent exits.
Like delegating to a contractor. You write a scope of work: "Search the codebase for all uses of the deprecated UserManager class and list them with file paths and line numbers." The contractor goes away, does the work independently, and comes back with a deliverable. You don't see their intermediate notes or false starts — just the final report. The contractor doesn't stick around after delivering.
Like sending a research request to an intern. You say: "Go through last quarter's customer feedback forms and pull out every complaint that mentions the checkout process. Give me a summary." The intern goes away, does the work, and returns with a document. They didn't interrupt you every five minutes. They didn't need you to read every form with them. They came back with the answer. A subagent works exactly this way — you brief it, it works independently, it reports back.

What Subagents CAN'T Do

  • See your conversation: They get only the task you assign. They don't know what you discussed 5 minutes ago.
  • Persist after completing: They're not daemons or services. They run, return, and disappear.
  • Communicate with each other: Each subagent is isolated. They can't share context.
  • Access parent's loaded skills/CLAUDE.md: They start clean. (They may load their own CLAUDE.md from the repo.)

How Subagents Get Used

Three ways:

  • Automatic delegation — Claude decides to use a subagent when a task benefits from focused, isolated work. You'll see "I'll use the Agent tool" in Claude's response. This is the most common case.
  • Manual via /agents — Type /agents to manage, list, or configure agents directly. Useful for understanding what's running or reviewing available agent configurations.
  • Project-level and user-level subagent definitions — You can define named agents with specific instructions in .claude/agents/ (project) or C:\Users\YourName\.claude\agents\ (personal). These are like pre-configured specialists you can invoke by name.

For most users, automatic delegation is all you need to understand. The deeper agent configuration is useful when you want specialized agents for recurring tasks in a project.

Chapter 9

Running Multiple Background Prompts

Claude Code can run tasks in the background while you keep typing. You don't have to wait for one task to finish before starting another. This is one of the most productive features in Claude Code and most people don't know it exists.

Method 1: Ctrl+B — Push a Running Task to the Background

Send a prompt normally (hit Enter). Claude starts working. While it's running, press Ctrl+B to push it to the background. Claude keeps working. Your prompt returns immediately and you can start a new task, type something else, or run shell commands — all while the background task continues.

Method 2: Send Multiple Prompts in Sequence

Send your first prompt. While Claude is responding, type your next prompt and send it. Claude Code handles multiple background tasks simultaneously. Each one buffers its output and you collect results when it finishes.

Method 3: Skill with Background Agent

Create a skill that tells Claude to spawn a subagent with run_in_background: true and isolation: "worktree". The skill defines the recurring task; invoking it with a slash command kicks off a background worktree agent while you continue in the main session. This is the closest thing to a "run this in the background" slash command.

KeyWhat it does
Ctrl+BPush current running task to background. Keep your prompt.
Ctrl+B Ctrl+BSame, for tmux users (tmux intercepts the first Ctrl+B)
Ctrl+TShow background task list and status
Ctrl+FKill all background agents if something goes wrong
Shift+EnterInsert a newline in your prompt without sending it (write multi-line prompts)
There is no single slash command that says "run this as a background agent." The background behavior is triggered by Ctrl+B after sending, or by explicitly spawning agents via skills/agent definitions. For a recurring task type, the cleanest approach is a named agent skill invoked with a slash command — one keystroke, runs in the background every time.
Like opening a new terminal tab. You don't close your current work — you just start something else in parallel. Ctrl+B is opening that second tab while Claude keeps running in the first one.
Like assigning a task to a team member and walking away. You brief them (the prompt), they start working (Ctrl+B), and you go do something else. When they're done, the result is waiting for you.
Chapter 10

Plans

Plan mode is a way to have Claude design an approach before it starts coding. Instead of immediately editing files and running commands, Claude describes what it will do: which files it will change, in what order, what the risks are, and what the expected outcome is. You review, adjust, and then approve the plan for execution.

Like writing an Architecture Decision Record (ADR) or a design document before writing code. You don't start a database migration by typing ALTER TABLE — you write a migration plan: what tables change, what data moves, what the rollback strategy is. Plans are the same discipline applied to AI-assisted coding: think first, then act.
Like the difference between writing code directly and drawing a UML class diagram first. Or like using the Visual Studio Class Designer to lay out relationships before generating code. The plan is the blueprint; the implementation follows.
Like a project manager writing a project charter before work starts. You don't assign tasks on day one — you define scope, identify risks, list dependencies, and get sign-off. Only then do you open Jira and create tickets. For a UX designer: you don't open Figma and start drawing — you sketch wireframes on paper and get feedback first. Plan mode is that discovery and alignment phase, applied to AI-assisted work.

How to Use Plan Mode

Press Shift+Tab to toggle plan mode. In plan mode, Claude will:

  1. Analyze the task you described
  2. List which files need changes
  3. Describe the approach for each change
  4. Identify risks or edge cases
  5. Wait for your approval before proceeding

You can push back: "What about error handling in step 3?" or "I'd prefer approach X over Y." The plan evolves through conversation. Once approved, exit plan mode and Claude executes.

When to Use Plans

  • Complex refactorings that touch many files
  • Architectural changes (new patterns, module reorganization)
  • Anything where the blast radius is large enough that you want to review before changes happen
  • When you want to understand Claude's approach before it starts

Honest Answer: Do You Even Need Plan Mode?

For most people, there isn't much practical difference between using Plan Mode (Shift+Tab) and just telling Claude in a plain prompt: "Research this, create a plan, and wait for my approval before touching any files." Both get you the same outcome. Claude is good at following that instruction.

The actual difference is thin:

  • Plan Mode (Shift+Tab) — sets a system-level flag that mechanically prevents Claude from using file-editing tools during planning. It cannot edit files even if it gets excited and wants to start. Hard enforcement.
  • Plain prompt ("plan first, wait for approval") — works 95% of the time. Can occasionally drift if the conversation runs long and your original instruction fades from context.

The real advantage of Plan Mode is enforcement, not capability. If your prompting discipline is solid, you're already getting Plan Mode's benefit without the keystroke.

Best Practice: Save the Plan as a .md File

Whether you use Plan Mode or a plain prompt, ask Claude to write the plan to a .md file before it starts executing:

Research the approach for this refactor, write the plan to PLAN.md, then wait for my approval.

This gives you the plan as a persistent artifact — on disk, in git, survives compaction and session end. You can review it, edit it, share it with your team, feed it back into a future session, or refer to it mid-execution if something goes wrong. A plan that only exists in the conversation is gone after the first /compact. A plan in a .md file is there forever.

Chapter 11

Context Compaction — What Gets Lost and Why It Motivates Memory

Every session has a finite context window — a limit on how much can be held in the active session at once. This limit is not a software setting. It is a hardware constraint: every token in your session occupies space in the GPU VRAM on Anthropic's servers, and that memory is finite and expensive. When you buy a Claude plan, you are buying time on that GPU memory. As a session runs, conversation history, file reads, and tool output accumulate in that VRAM until something has to give. That something is space, and that giving is compaction.

Compaction is not deletion. Claude summarizes the older parts of the conversation to make room for new material. The detail is reduced but the gist is preserved — a thick folder of notes compressed into a single summary page. What you lose is precision. What survives is the broad shape of what happened. For the full mechanics and how to use hints to control what survives, see Chapter 15.4: /compact.

What Survives Compaction

  • CLAUDE.md — always reloaded from disk at the start of every session and after compaction. It is not stored in conversation history; it comes from the file. It cannot be compacted away. This is its most important property.
  • Files on disk — anything written to a file (plans, documentation, changelogs, code) is permanent. Compaction only affects the conversation. The file is untouched. See the Plans chapter: a plan written to PLAN.md survives compaction; a plan that only exists in the conversation does not. → Chapter 10 (Plans)
  • Plugin meta-skills — the SessionStart hook re-injects the meta-skill at startup, resume, clear, and /compact. It comes back automatically.

What Gets Lost

  • Skills you loaded during the session — invoking a skill puts its content into conversation history. When that history gets compacted, the skill's instructions are summarized or dropped. Claude silently stops following them. No error. No warning. You just notice Claude is no longer behaving as instructed. Re-invoke the skill after compacting. → Chapter 15.4: Skills and Compaction
  • Plugin skills — same behavior as personal skills. A plugin skill invoked during a session lives in conversation history and is vulnerable to compaction. Unlike the meta-skill, individual plugin skills do not get re-injected automatically. Re-invoke after compacting.
  • Documentation loaded into context — if you read a file into the session (README, architecture doc, etc.), that read lives in conversation history. After compaction it may be summarized away. The file on disk is fine; load it again if you need it. → Chapter 12 (Documentation as Dormant Memory)
  • Specific details buried in long tool output — file reads, search results, test output. The broad findings survive in summary; the exact line numbers and specific values may not. This is why compaction hints exist: /compact focus on the payment bug — root cause is checkout.js line 142 tells Claude what precision must survive the compression.

Plugin Skills Are More Compact-Resilient Than CLAUDE.md

This surprises people. CLAUDE.md is always in context — always consuming tokens, always at risk of being part of what gets summarized. Plugin skills are loaded on demand — they only enter context when invoked, and can be reloaded by invoking them again. The meta-skill is the exception: it re-injects automatically via the SessionStart hook on every compact, clear, and resume. So:

  • CLAUDE.md: always present, always reloads from file after compaction ✓
  • Plugin meta-skill: re-injected automatically by hook after compaction ✓
  • Individual plugin skills: must be manually re-invoked after compaction ✗
  • Personal skills: must be manually re-invoked after compaction ✗

The Implication: Why Memory Exists

Compaction is why memory files exist. If CLAUDE.md is for rules ("always do X") and skills are for on-demand workflows, there is still a gap: facts discovered during a session. When Claude figures out that a particular API returns null on Sundays, or that the authentication module was rewritten in v8 for compliance reasons, that knowledge lives only in conversation history — and compaction will eventually blur or lose it.

Memory files solve this. Claude writes them during the session, they persist on disk, they reload automatically next session like CLAUDE.md. They are the answer to "how do I preserve what Claude learned, not just what I told it?" The next chapter covers them in detail.

Like a whiteboard session that gets photographed before it's erased. Compaction is the erasure — unavoidable, eventually. Memory files are the photograph. CLAUDE.md is the printed process document pinned on the wall that was never on the whiteboard to begin with.
Chapter 12

Memory (mini-CLAUDE.md files)

Memory files are persistent notes stored in your personal Claude folder, inside a per-project subdirectory. Think of them as tiny auto-written CLAUDE.md entries — they load automatically every session, affect every prompt, and cost input tokens every message, just like CLAUDE.md. The difference is that Claude writes them itself (rather than you writing them), and they store discovered facts rather than rules.

Memory files are essentially auto-written micro-CLAUDE.md entries — they affect every prompt, cost tokens every session, and survive across sessions. They're not free. The difference from a skill is that you don't invoke them — they always load.
Memory files are not free — they load every session and cost tokens on every prompt, just like CLAUDE.md. When you say "remember that the payments module has no tests" and Claude writes a memory note, that note will be loaded at the start of every future session and charged as input tokens on every message for the life of the project. Memory files are essentially tiny auto-written CLAUDE.md entries. They're a skill that always loads — you never invoke them. Keep memory files focused and prune stale ones periodically, or they accumulate into a silent token cost you've forgotten about.

Memory files are persistent notes stored in your personal Claude folder, inside a per-project subdirectory. On Windows, that's C:\Users\YourName\.claude\projects\C--repos-MyProject\memory\ (where the project path C:\repos\MyProject becomes C--repos-MyProject with backslashes replaced by double-dashes). They survive across sessions. When you start a new session in a project that has memory files, Claude loads them automatically. This is how knowledge persists beyond a single session.

Like a developer's personal wiki or OneNote notebook for a project. Things you'd write on a sticky note and put on your monitor: "The prod database connection string is in Azure Key Vault, not in the config file." "The client requested we not use the word 'delete' in the UI — use 'archive' instead." "The payments module was written by a contractor and has no tests — be careful."
Like the "Notes" section in a Visual Studio solution's properties. Or like the sticky notes on your physical monitor that say "DON'T TOUCH THE BILLING TABLE" or "Ask Bob before modifying the auth flow." Memory files are those sticky notes, but for Claude.
Like the "tribal knowledge" document a good team member writes before going on leave. "Here's everything I know about this client that isn't in the CRM: they hate being called before 10am, the real decision-maker is the CFO not the VP, and they had a bad experience with our competitor's product two years ago." Memory files are that knowledge transfer document — the context that doesn't fit in a formal system but is critical to working effectively.

The Full Persistence Landscape

Claude Code has several mechanisms for persisting knowledge. Here's how they compare:

MechanismWho writes itToken costScopeIn git?Best for
CLAUDE.md (repo root)Claude (via /init or when you ask)Every message, every sessionAll team members on this projectYesArchitecture rules, coding standards, forbidden patterns, build commands
CLAUDE.md (personal global)You or ClaudeEvery message, every projectYou, on all projectsNeverYour personal preferences: response style, output format, habits
Memory filesClaude (auto or on request)Every message, this projectYou, on this projectNeverTribal knowledge, discovered gotchas, session-to-session notes
Auto memoryClaude (automatically)Every message, this projectYou, on this projectNeverThings Claude decides are worth remembering without you asking
SkillsYouOnly when invokedYou or team, this projectOptionalDetailed procedures, reference docs, specialised workflows
Scoped rules (.claude/rules/)You or ClaudeEvery message (in scope)Project or subdirectoryOptionalFiner-grained rules per directory; emerging feature, check current docs
Documentation files [your_file.md]You (or Claude when you ask)Zero — until you read the fileAnyone — any AI, any developer, any platformYes — checked into gitArchitectural facts, decisions, hard-won knowledge that any tool needs. Platform-agnostic. See "Documentation as Dormant Memory" below

Rule of thumb: Team knowledge → CLAUDE.md (checked in). Personal Claude-specific discoveries → memory files. Permanent project facts any AI needs → documentation files. Detailed procedures → skills.

How Memory Gets Created

Three ways:

  • Auto memory — Claude notices something worth remembering and writes a memory note on its own, without you asking.
  • On request — You say "remember that the payments module has no tests" and Claude writes it to memory.
  • Manually — You create or edit markdown files directly in the memory directory (C:\Users\YourName\.claude\projects\C--repos-MyProject\memory\).
Chapter 13

Documentation as Dormant Memory

Memory files and CLAUDE.md are always wired in — they load every session, cost tokens every message, and are specific to Claude. There is another kind of persistent knowledge that costs nothing at rest, survives any AI platform, and can be shared with your whole team: documentation.

A README.md, an ARCHITECTURE.md, a DECISIONS.md, a [your_file.md], etc. — these are files you write as you build, recording facts, decisions, and context about your project. They sit on disk, in git, doing nothing until someone reads them. They are sleeping memory: available when you need them, invisible when you don't.

Documentation is platform-agnostic memory. Claude can read it. Copilot can read it. Gemini can read it. A new developer on your team can read it. Memory files are wired to Claude's brain. Documentation is a spare brain.

Documentation vs. Memory Files — The Key Difference

Memory filesDocumentation (.md files)
Token costEvery session, every message — always onZero — until you explicitly ask Claude to read the file
When activeAlways, automaticallyOnly when you or Claude reads the file
Who can use itClaude only (Claude Code specific)Any AI, any developer, any tool
Survives AI switchingNo — tied to Claude Code's memory systemYes — it's just a file in your repo
In gitNever — personal, outside the repoYes — shared with the whole team
Best forFrequently-needed facts Claude should always knowArchitectural decisions, context, facts that are needed occasionally or by multiple tools

What to Document

Write documentation for anything that would need to be re-explained to a new person — or a new AI — starting fresh on the project:

  • ARCHITECTURE.md — How the system is structured. What the major components are and how they interact. Why you made the big structural decisions.
  • DECISIONS.md — Architecture Decision Records (ADRs). What was decided, why, what alternatives were considered. Written at the time of the decision, not reconstructed later.
  • README.md — What the project is, how to build it, how to run it, where to start. The first file any AI or developer reads.
  • GOTCHAS.md — Hard-won facts that don't fit anywhere else. "The payments module has no tests." "The config file is read at startup only — restart required for changes." These are things you'd put in memory files, except here they cost no tokens and any tool can see them.
  • CODE_SCORE.md — A running assessment of code quality, technical debt, areas that need attention, and what you'd fix if you had the time. Give any AI a code quality briefing before a refactor session.
  • COMPETITOR_ANALYSIS.md — What competing products exist, how they differ, what they do better or worse. Any AI helping you make product decisions should know the landscape.
  • ROADMAP_IDEAS.md — Feature ideas, future directions, things that didn't make the cut but are worth revisiting. A scratchpad that survives sessions and platforms, so good ideas don't get lost in a conversation that expires.

Activating Documentation in a Session

Documentation is dormant until you wake it up. You control when it enters context:

  • Explicitly: "Read ARCHITECTURE.md before we start" — Claude reads it and it's in context for that session
  • Via a skill: Create a skill that loads specific documentation files when you start a particular type of work
  • Via CLAUDE.md: Add a line like "When starting a new feature, read ARCHITECTURE.md and DECISIONS.md" — Claude will do it automatically on relevant tasks

When the session ends or compacts, the documentation content may be summarized away — but the file is still there on disk. Wake it up again in the next session with one sentence.

Documentation Across AI Platforms

This is the strongest argument for documentation over memory files: it works with any AI. If you switch from Claude Code to Copilot CLI, Codex, or Gemini CLI for a task, those tools have no access to your Claude memory files. But they can all read a ARCHITECTURE.md in your repo. Documentation is the universal language — it's just files.

Like building permits and blueprints. The memory file is a note on your desk that only you see. The documentation is the official blueprint filed with the city — anyone who needs to work on the building can get a copy.
Like a knowledge base article vs. a personal note. Your personal note (memory file) is always with you. A knowledge base article (documentation) sits on a shelf until someone needs it — but any team member, any tool, any future worker can find and use it.
Create a slash command called /load-docs that instructs Claude to read your key documentation files at the start of a session: "Read README.md, ARCHITECTURE.md, and DECISIONS.md. Summarize what you learned before we begin." One command gives any AI a full project briefing at the cost of one read operation — not permanent token overhead.
Chapter 14

Plugins (/plugin install)

A Claude Code Plugin is an installable package — a bundle that can contain slash commands, subagents, hooks, and MCP configs (see MCP Config below). One /plugin install command sets all of it up at once.

/plugin install package-name
Like installing a VS Code extension pack — one install delivers multiple capabilities packaged for a specific workflow.

Popular Plugins — Real Examples

You can always ask your AI how to install any of these. Type "how do I install the Superpowers plugin?" and Claude will give you current steps. We include one real example below so you get a sense of what plugin installation actually looks like — but for any specific plugin, just ask Claude.
Example: Installing Superpowers on Windows

Superpowers is the most popular Claude Code plugin (~54K GitHub stars). Here is the actual install on Windows:

/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace

# Optional: Install globally (activates in every project automatically)
/plugin install superpowers@superpowers-marketplace --global

After installation, use slash commands like /brainstorming to explore requirements before coding, or /execute-plan to run a structured implementation plan with review checkpoints. Requires Claude Code v2.0.13 or later.

Methodology Frameworks
  • Superpowers (/plugin install superpowers) — The most starred Claude Code plugin (~54K GitHub stars, in Anthropic's official marketplace). Enforces a full structured development methodology: Socratic brainstorming first, detailed spec writing, test-driven development, parallel subagent implementation, and built-in code review. If you install one plugin, this is the one developers recommend first. github.com/obra/superpowers
  • cc-sessions — An opinionated extension set combining hooks, subagents, and commands into a workflow that manages long coding sessions with context preservation and structured handoffs between sessions.
Slash Command Collections
  • commands (~1.7K stars) — A curated collection of production-ready slash commands: /review, /explain, /test, /refactor, /security-audit, and more. Drop-in shortcuts for the most common developer tasks.
  • awesome-slash — 40+ agents and 26 skills for Claude Code, OpenCode, and Codex. Mix-and-match commands across AI platforms.
Memory and Context
  • claude-mem (~20K stars in its first two days) — Long-term memory across sessions. Automatically writes and reads project memory so Claude remembers what you've built, what decisions were made, and what still needs doing — across any number of sessions.
Integrations and SaaS Connectors
  • connect-apps (Composio) — Instantly connects Claude to 500+ SaaS apps: GitHub, Slack, Gmail, Notion, Jira, Salesforce, and more. Real actions directly from the CLI — not just reading, but writing, creating, updating.
  • ship — Full PR automation pipeline: linting, testing, review, and production deployment triggered from Claude. Handles the entire commit-to-deploy loop.
Code Quality and Testing
  • local-review — Parallel local diff code reviews. Claude spawns multiple review subagents simultaneously, each focusing on a different concern (security, performance, style, logic), then consolidates findings.
  • ralph-wiggum — Visual testing plugin. Takes screenshots, compares against baselines, and reports regressions. Claude can see what users see.
  • TypeScript LSP / Rust LSP plugins — Real compiler type-checking inside Claude sessions. Catches type errors as Claude writes code, not after.
Where to Find More
Want to build your own plugin? The Advanced: Building a Plugin tab walks through a complete JIRA integration plugin step by step — manifests, hooks, skills, slash commands, credential security, and the token cost implications most people miss.
Chapter 15

MCP Config (Model Context Protocol)

"MCP server" is a misnomer — Anthropic's own term, used consistently in their docs, but technically misleading. Here is what actually happens: Claude Code launches a program as a child process. That program listens for JSON-RPC messages on its stdin, executes tool calls, and writes results back on stdout. It is a local process acting as a tool dispatcher — not a server in any meaningful sense of that word. There is no port. There is no network. There is no deployment. It starts when your Claude Code session starts and stops when the session ends. Calling it a "server" is like calling a calculator a "computation server." The word stuck because the underlying protocol (MCP) borrowed server/client terminology from web architecture, even though the actual implementation is just two programs talking through pipes on your own machine. We use "MCP server" throughout because that is what every doc, tutorial, and developer uses — but now you know what it actually is.

An MCP server gives Claude access to external systems — databases, APIs, Jira, Slack, GitHub. Without one, Claude can read and edit your files and run commands in your terminal, but it cannot call external APIs in a structured, credential-safe way. An MCP server is the bridge between Claude and the outside world.

Claude Code can run any program on your machine without MCP. Through its Bash tool, Claude has access to every CLI program in your PATH. Git, curl, the Atlassian CLI, AWS CLI, kubectl, psql — if you can run it in a terminal, Claude can run it. MCP is the structured, credential-safe approach. The Bash tool + an installed CLI is the shortcut — simpler to set up, less elegant, same result for many tasks.
Like COM objects or plugin DLLs. A separate process exposes methods via a well-defined protocol. Your application calls those methods without knowing the implementation. MCP servers are the same: separate local processes exposing tools via the Model Context Protocol (JSON-RPC over stdin/stdout).
Like a VS Code extension running in the extension host process. It exposes commands and capabilities via an API; the editor calls them when needed. An MCP server does the same for Claude Code.
Like giving a new hire credentials to Salesforce, Jira, and the database. Without those logins they can answer questions but can't act. An MCP server is those credentials — properly configured so Claude can act, not just advise.

How It Works

Claude Code launches the MCP server as a child process at session start. The server declares what tools it provides. Claude sees those tools and can call them during conversation exactly like its built-in Read, Edit, and Bash tools.

Examples

  • Database MCP: Exposes run_query, list_tables. Claude can query your database.
  • Jira MCP: Exposes create_issue, update_status, search_issues. Claude can manage tickets.
    You can bypass the Jira MCP entirely by installing the official Atlassian CLI (ACLI). Claude calls it through the Bash tool — no MCP needed. This works for most enterprise tools with a CLI.
  • Slack MCP: Exposes send_message, search_messages.
  • GitHub MCP: Exposes create_pr, list_issues, search_code, get_file_contents, and more. Claude can create pull requests, search issues, read files, and interact with GitHub Actions.
    You don't actually need a GitHub MCP server to use GitHub from Claude Code. If you install the GitHub CLI (gh) and Git on Windows, Claude can call them directly through the Bash tool — no MCP required. Claude already knows how to use gh pr create, gh issue list, and standard git commands. The MCP server gives you a cleaner, credential-safe interface, but for most tasks the CLI tools are faster to set up. We show the MCP install here as a real example of what MCP installation actually looks like.

    Installing the GitHub MCP Server — Step by Step

    Step 1: Create a GitHub Personal Access Token (PAT)

    1. Go to github.com/settings/tokens
    2. Click Generate new token (classic)
    3. Name it "Claude Code MCP"
    4. Scopes: repo for private repos; no scopes for public-only read access
    5. Copy the token — you will not see it again

    Step 2: Or manually edit settings.json

    File location: C:\Users\YourName\.claude\settings.json

    {
      "mcpServers": {
        "github": {
          "command": "docker",
          "args": [
            "run", "-i", "--rm",
            "-e", "GITHUB_PERSONAL_ACCESS_TOKEN",
            "ghcr.io/github/github-mcp-server"
          ],
          "env": {
            "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_YOUR_TOKEN_HERE"
          }
        }
      }
    }

    Step 3: Restart Claude Code and test

    Start a new session. Try: "List my open pull requests" or "What issues are open in my repo?"

Installing an MCP Server Manually

  1. Get the package. Most are npm or Python packages.
  2. Add it to your settings file under the mcpServers key:
{
  "mcpServers": {
    "my-database": {
      "command": "npx",
      "args": ["-y", "@example/mcp-database-server"],
      "env": {
        "DATABASE_URL": "postgres://localhost:5432/mydb"
      }
    }
  }
}
  1. Restart Claude Code. MCP servers load at session start.
  2. Use it. Just ask Claude: "Query the users table."

Scope

  • PersonalC:\Users\YourName\.claude\settings.json — available in every session, never in git
  • Project-privateC:\repos\MyProject\.claude\settings.local.json — project-specific, NOT in git
  • Team-sharedC:\repos\MyProject\.claude\settings.json — committed to git, team gets it on clone

Security

An MCP server runs on your machine with your user permissions. A malicious one could read files, exfiltrate code, or send data externally. Treat it like installing a VS Code extension from an unknown publisher — inspect before you trust.

MCP servers manage their own credentials. Claude never sees your raw API token — it just asks the server to "create a ticket" and the server handles auth. Never put credentials in CLAUDE.md or skills.
Chapter 16

The Context Window

The context window is the total amount of text (measured in tokens) that Claude can "see" at once during a conversation. Everything — every message, every file read, every CLAUDE.md instruction, every skill, every tool result — occupies tokens in this finite window.

Like RAM. You have a fixed amount (roughly 200K tokens for current models). Every process (message, file, instruction) that's loaded takes up space. When RAM is full, the system starts paging: older, less-used content is compressed (summarized) to make room for new content. You lose detail but keep the gist. And just like RAM, more is better — but there's always a limit.
Or like a physical workbench. You can have blueprints, tools, materials, and notes on the bench — but the bench has a fixed surface area. If you add a new large blueprint, you have to stack or fold something else. A 10-page CLAUDE.md takes up bench space that's unavailable for actual work. Keep your bench clear of unnecessary items.
Like the working memory of a meeting. A facilitator can hold about 7 agenda items in their head at once. If you pile on twelve topics, they start forgetting earlier ones by the time you get to the end. The context window is that mental working space — finite, shared by everything you're discussing. A 30-page requirements document read into the conversation is like handing the facilitator a novel and asking them to keep track of everything.

What Consumes Tokens

  • CLAUDE.md — Always loaded, always consuming. Keep it concise.
  • Skills — Consumed when loaded. Use skills instead of CLAUDE.md for specialized knowledge.
  • Conversation history — Every message back and forth. Grows continuously.
  • File contents — When Claude reads a file, its contents enter the context. Large files cost thousands of tokens.
  • Tool results — Output from grep, bash, etc. Verbose output costs more tokens.
  • Slash command content — The full prompt text from the command file.

What Happens When It Fills Up

Older conversation turns are summarized to free space. The system prioritizes keeping recent content and persistent instructions (CLAUDE.md, active skills). It's lossy compression: you keep the conclusions and key facts, but lose the detailed reasoning and intermediate steps.

Token Economics

Think of your context window like a budget:

  • Fixed costs: CLAUDE.md tokens (paid every session)
  • Variable costs: Files read, commands run, messages exchanged
  • Budget: ~267 pages on Pro (200K tokens), ~1,333 pages on Max/Team/Enterprise (1M tokens)

A bloated CLAUDE.md is like a high recurring expense: it reduces your available budget for everything else. This is why architecture (skills for on-demand loading, concise CLAUDE.md, memory for cross-session persistence) matters so much.

What "Extended Context Model" Actually Means

Anthropic uses the phrase "extended context model" on their pricing and Enterprise pages. It sounds like a different, fancier AI. It isn't. It is the same Opus 4.6 or Sonnet 4.6 you already use, with a larger reading window enabled. Think of it as the same brain, with more desk space to spread out on.

PlanContext windowAwareness (pages)Monthly costHow to unlock
Pro200K tokens~267 pages$20Default
Pro (extended)1M tokens~1,333 pages$20Type /extra-usage in Claude Code
Max / Team / Enterprise1M tokens~1,333 pages$100–$200+Automatic — no opt-in needed

In practical terms: a codebase of 500 files averaging 200 lines each is roughly 100,000 lines of code. At ~50 lines per page, that's 2,000 pages. Neither 267 nor 1,333 covers that entire codebase at once — but 1,333 pages covers far more without needing to chunk, re-read, or lose earlier context. For large projects, the 1M window is the difference between Claude understanding your architecture and Claude seeing only a slice of it.

The "Lost in the Middle" Problem — Be Warned

Here is something Anthropic does not advertise loudly: when you fill a very large context window, AI models tend to remember what's at the very beginning and the very end much better than what's buried in the middle. This is called the "lost in the middle" problem, documented in AI research.

In practice: if you load 1,333 pages of code, the files you loaded first and the files you loaded last will get the most attention. Files in the middle may be effectively invisible. Claude handles this better than other models at large context sizes — but it is not perfect, and real developer testing confirms the effective useful window is often smaller than the advertised maximum.

Developer testing on Reddit and GitHub Issues shows mixed results at 1M context. Sessions under ~500K tokens (667 pages) seem reliable. Beyond that, quality can degrade. The 1M window is real and useful — just don't assume Claude is giving equal attention to all 1,333 pages simultaneously.
◈ Advanced — What a Context Window Actually Is Physically

Tokens do not map to your laptop's RAM at all. The context window lives entirely on Anthropic's servers, in GPU VRAM — graphics card memory on a data center GPU cluster, not regular RAM, and certainly not anything on your machine.

What Happens When You Send a Message

  1. Your text is tokenized on your machine — trivially small, a few kilobytes
  2. The tokens are sent over HTTPS to Anthropic's data center
  3. Anthropic's GPU cluster loads your entire conversation into GPU VRAM
  4. The model processes every token and generates a response token by token
  5. The response streams back to you over HTTPS

Your laptop contributes essentially nothing to the computation — just tokenizing your text and the network connection.

What "200K Tokens" Means Physically — The Math

Each token in a long context contributes to memory use in the KV cache — an internal memory structure used by transformer-style models to track attention state across prior tokens during generation.

A rough way to think about the memory cost per token:

  • For each token, the model stores two vectors — called Key and Value — which encode what that token “means” in context.
  • Those two vectors are stored at every layer of the network. A large model might have 80 layers; a smaller one might have 32.
  • Each layer has some number of attention heads that maintain Key/Value pairs. Modern models (using Grouped Query Attention) use far fewer heads than older designs, which is why they are cheaper to run.
  • Each vector is made of numbers stored in half-precision (2 bytes each, typically).

Multiply those four things together — 2 vectors × layers × heads × bytes per number — and you get the memory footprint per token.

A reasonable middle-of-the-road teaching estimate is about 0.5 MB per token, though the real number can vary significantly by architecture and serving design. A broad rough range based on publicly measured architectures is ~64 KB to ~0.5 MB per token — modern GQA models (Llama 3-class) sit toward the low end; older full multi-head attention models sit toward the high end.

That implies:

  • 200K tokens ≈ ~100 GB at the midpoint
  • 1M tokens ≈ ~500 GB at the midpoint

Again, those are rough estimates, not universal constants.

The key point is simple: large context windows are a real infrastructure cost. The raw text itself is tiny. What takes memory is the model’s internal mathematical state for handling that text.

To put that in perspective: a 1M token context at even the low end of the range requires around 64 GB of GPU memory — more than any gaming GPU holds, and a large fraction of a high-end server GPU. That is one major reason large-context AI is expensive to provide.

Why GPUs? And Are They Even "Graphics" Cards Anymore?

GPUs won the AI race by accident. They were designed to render 3D graphics — which requires multiplying enormous matrices of numbers in parallel, thousands of operations at once. It turned out that transformer models are also, at their core, enormous parallel matrix multiplications. So GPUs were already the right tool.

A CPU has a handful of powerful cores — maybe 8 to 64 — each running at high clock speed, good at sequential logic. A GPU has thousands of weaker cores running simultaneously, good at doing the same simple operation on millions of numbers at once. For AI inference, that trade-off wins by a large margin.

The chips running Claude today are not gaming cards. They are purpose-built server hardware that happen to share an architecture with graphics chips:

ChipMemoryMemory bandwidthApprox. price (per chip)Who makes it
NVIDIA H10080 GB HBM33.35 TB/s$25,000–$40,000NVIDIA
NVIDIA H200141 GB HBM3e4.8 TB/s~$35,000–$45,000NVIDIA
NVIDIA B200192 GB HBM3e8 TB/s~$70,000–$80,000NVIDIA
AMD MI300X192 GB HBM35.3 TB/s~$10,000–$15,000AMD
Google TPU (Ironwood)HBM (shared across pod)Very highCloud rental onlyGoogle
RTX 4090 (consumer, for comparison)24 GB GDDR6X576 GB/s~$1,600NVIDIA

The key differences from a consumer card are not just price — it is memory capacity (80–192 GB vs 24 GB), memory type (HBM is stacked directly on the chip for much higher bandwidth), and the ability to connect multiple chips together at high speed so they share memory across a server.

Are GPUs Even Required?

No. GPUs dominate today because they were available and proven, not because they are the only or best option. The underlying requirement is just: hardware that can do massive parallel matrix math efficiently.

Purpose-built AI chips already exist and are in production use. Google's TPUs are designed from scratch for transformer inference and have no graphics heritage at all. Amazon's Trainium chips serve the same role. Anthropic itself runs on all three platforms — Google TPUs, AWS Trainium, and NVIDIA GPUs via Azure — specifically to avoid being locked into any one vendor. The "AI data center runs on GPUs" story is already becoming "AI data center runs on parallel math chips, some of which used to be GPUs."

There is an active race to build better chips. NVIDIA releases a new generation roughly every 18 months. Google, AMD, Amazon, and others are investing heavily in alternatives. The economics are enormous: a single H100 costs more than most developers earn in a year, and Anthropic has committed to spending tens of billions on compute infrastructure.

Why This Matters to You

  • Latency: In this context, latency means the delay between sending your message and receiving the first word of Claude’s response. A larger context window means more tokens to process during the prefill phase, which means more GPU work before any output begins. A fresh session with no history responds noticeably faster than one carrying 50,000 tokens of prior conversation.
  • Cost: More tokens = more GPU time = more cost to Anthropic = why larger context plans cost more.
  • The "lost in the middle" problem: The KV cache stores all tokens, but the attention mechanism pays disproportionate attention to recent and early tokens. Middle tokens are physically present in memory — they're just attended to less. This is a GPU math problem, not a storage problem.
  • Local models (Ollama etc.) are limited by your own hardware: When you run a model locally, the KV cache lives in your GPU’s VRAM. A 16 GB card running a 7B model may only support 4K–8K context before running out of memory. The most a serious local AI user would typically spend is an RTX 4090 (24 GB, ~$1,600–2,000) or the newer RTX 5090 (32 GB, ~$2,000–2,500). Stepping up to a prosumer workstation card like the RTX 6000 Ada (48 GB, ~$6,000) buys meaningful headroom. The wildcard is Apple Silicon — an M4 Ultra Mac Studio treats RAM and GPU memory as a single shared pool, giving up to 192 GB of effective VRAM for around $5,000–8,000, which is genuinely competitive for local inference. None of these touch the 80–192 GB that a single data center chip carries, which is why cloud models feel so much more capable.
Chapter 17

The File System Layout

Everything in Claude Code's architecture maps to a file in a predictable location. There is no registry, no database, no opaque binary format. Once you know the directory structure, you can inspect, edit, backup, or debug anything with a text editor.

In all paths below: ~ means your Windows home directory — C:\Users\YourName\. Paths starting with ~/.claude/ are in your user profile. Paths starting with just .claude/ are inside your current project repo. These are two different places.

Your Personal Folder — C:\Users\YourName\.claude\

Unix notationWindows pathWhat it is
~/.claude/C:\Users\YourName\.claude\Root of your personal config
~/.claude.jsonC:\Users\YourName\.claude.jsonUser-level MCP server definitions, OAuth sessions, and preferences (theme, notifications). Note: this is a file at your home directory, not inside the .claude\ folder.
~/.claude/settings.jsonC:\Users\YourName\.claude\settings.jsonYour personal preferences (tools, model, permissions)
~/.claude/CLAUDE.mdC:\Users\YourName\.claude\CLAUDE.mdYour instructions, applied to every project
~/.claude/rules/C:\Users\YourName\.claude\rules\User-level rules — topic-specific .md files that apply to all projects
~/.claude/commands/C:\Users\YourName\.claude\commands\Your personal slash commands, everywhere
~/.claude/skills/C:\Users\YourName\.claude\skills\Your personal skills, everywhere
~/.claude/agents/C:\Users\YourName\.claude\agents\User-scoped subagent definitions, available in all projects
~/.claude/projects/C:\Users\YourName\.claude\projects\Session data for all your projects (auto-managed)
~/.claude/projects/C--repos-MyApp/C:\Users\YourName\.claude\projects\C--repos-MyApp\Data for the project at C:\repos\MyApp
~/.claude/projects/C--repos-MyApp/memory/C:\Users\YourName\.claude\projects\C--repos-MyApp\memory\Memory files for that project

Inside Your Project Repo — C:\repos\MyProject\.claude\

Unix notationWindows path (example)What it is
.claude/C:\repos\MyProject\.claude\Project config folder (like .vscode in a repo)
.claude/CLAUDE.mdC:\repos\MyProject\.claude\CLAUDE.mdAlternative location for project instructions — works the same as CLAUDE.md at the repo root; use whichever keeps your root cleaner
.claude/commands/C:\repos\MyProject\.claude\commands\Project slash commands, shareable via git
.claude/skills/C:\repos\MyProject\.claude\skills\Project skills, shareable via git
.claude/rules/C:\repos\MyProject\.claude\rules\Project rules — topic-specific .md files, discovered recursively, shareable via git
.claude/agents/C:\repos\MyProject\.claude\agents\Project subagent definitions, shareable via git
.claude/settings.jsonC:\repos\MyProject\.claude\settings.jsonProject settings, can be committed to git
.claude/settings.local.jsonC:\repos\MyProject\.claude\settings.local.jsonYour personal overrides — add to .gitignore

Repo Root (no subfolder)

FileWindows path (example)What it is
CLAUDE.mdC:\repos\MyProject\CLAUDE.mdProject instructions — committed to git, shared with team. Can also live at .claude/CLAUDE.md inside the repo; both locations work.
.mcp.jsonC:\repos\MyProject\.mcp.jsonProject-scoped MCP server definitions — committable to git so the whole team shares the same MCP setup
The whole layout follows the principle of "proximity to scope." Personal config lives in your home directory. Project config lives in the project. Shared config is checkable into git. Local overrides are gitignored. It's the exact same layering as appsettings.json / appsettings.Development.json / environment variables in ASP.NET — base, environment-specific, personal.
Like the difference between your personal email signature (always yours, on every email you send), the company email template (shared with everyone on your team), and the project-specific footer you add when emailing clients for a specific account. Three layers: personal, shared, project-specific. They layer on top of each other. That's exactly how Claude Code's config files are organized.
Chapter 18

Scope and Distribution

Every piece of Claude Code configuration has a scope (who it affects) and a distribution method (how it spreads). Understanding this prevents "where do I put this?" confusion.

The Scope Matrix

Remember: ~ = C:\Users\YourName\. Paths with ~/.claude/ are in your user profile. Paths with just .claude/ are inside the repo.

File / FolderScopeIn Git?
C:\Users\YourName\.claude.jsonPersonal + All projects. MCP servers, OAuth, preferences.Never
C:\Users\YourName\.claude\settings.jsonPersonal + All projects. Your preferences everywhere.Never
C:\Users\YourName\.claude\CLAUDE.mdPersonal + All projects. Your instructions everywhere.Never
C:\Users\YourName\.claude\rules\Personal + All projects. Your topic-specific rules everywhere.Never
C:\Users\YourName\.claude\commands\Personal + All projects. Your commands everywhere.Never
C:\Users\YourName\.claude\agents\Personal + All projects. Your subagent definitions everywhere.Never
C:\Users\YourName\.claude\projects\*\memory\Personal + Per-project. Your notes for one project.Never
C:\repos\MyProject\CLAUDE.md or .claude\CLAUDE.mdShared + Per-project. Team instructions.Yes
C:\repos\MyProject\.claude\rules\Shared + Per-project. Team rules.Optional
C:\repos\MyProject\.claude\commands\Shared + Per-project. Team commands.Optional
C:\repos\MyProject\.claude\agents\Shared + Per-project. Team subagent definitions.Optional
C:\repos\MyProject\.claude\settings.jsonShared + Per-project. Team settings.Optional
C:\repos\MyProject\.mcp.jsonShared + Per-project. Team MCP servers.Optional
C:\repos\MyProject\.claude\settings.local.jsonPersonal + Per-project. Your local overrides.Never (.gitignore)

Decision Framework

When you need to configure something, ask two questions:

  1. Who needs this? Just me? Put it in C:\Users\YourName\.claude\. The whole team? Put it in the repo.
  2. What scope? All projects? Your personal .claude\ folder. This project only? The repo's .claude\ folder or CLAUDE.md.
It's the exact same decision tree as: "Do I put this in my global .gitconfig, the repo's .gitconfig, or the repo's .gitattributes?" Or: "Do I put this VS setting in user settings, workspace settings, or .editorconfig?" The principle is universal. Claude Code just applies it to AI configuration.
Ask yourself: "Who should follow this rule?" If it's just you on every project — put it in your global settings. If it's your whole team on this project — put it in the project CLAUDE.md and check it in. If it's sensitive (passwords, API keys) — put it in your local override file that never gets shared. Same three questions a good HR manager asks before writing a policy: is this personal, team-wide, or project-specific?

.gitignore Conventions

Typically, .gitignore excludes:

  • .claude/settings.local.json — Personal overrides
  • Your personal C:\Users\YourName\.claude\ folder is outside every repo, so it's inherently never in git

What CAN be checked in: CLAUDE.md, .claude/CLAUDE.md, .claude/rules/, .claude/commands/, .claude/skills/, .claude/agents/, .claude/settings.json, .mcp.json

Chapter 19

The Prompt Engineering Trap

You know how to prompt. You've been prompting AI tools for months or years. You can write a prompt that gets Claude to produce good code. So why does any of this architecture matter?

Because prompting is like knowing SQL syntax without understanding indexes, joins, normalization, or query plans. You can write queries — they'll return correct results. But they'll be slow, unmaintainable, and won't scale. The person who understands the architecture writes queries that are fast, maintainable, and scalable, even if the SQL syntax is identical.

You know how to type C# code. But there's a world of difference between someone who types C# and someone who understands SOLID principles, dependency injection, the CLR, garbage collection, and async/await internals. Both write C# code. One writes C# code that works in production. The architecture knowledge is the difference between a junior developer who can code and a senior developer who can engineer.
A child can send an email. A scientist can send an email describing a new cure for a disease — the precise formulation, the trial data, the dosing protocol, the contraindications — to the right colleagues, in the right format, so it gets acted on rather than ignored. Both used the same email button. The difference is not the tool; it is what the person brought to it. AI architecture knowledge is the difference between clicking send and knowing exactly what to say, to whom, and how to make it count.

What Architecture Gives You

  • Consistency: CLAUDE.md ensures every session follows team standards — you don't have to repeat yourself.
  • Enforcement: Hooks guarantee that tests run, linters pass, forbidden patterns are blocked — not just suggested.
  • Efficiency: Skills load context only when needed, preserving your context window for actual work.
  • Persistence: Memory carries knowledge across sessions — Claude doesn't forget what you told it yesterday.
  • Extensibility: MCP servers let Claude interact with your specific tools and systems — databases, issue trackers, APIs.
  • Reusability: Slash commands encode workflows you can invoke with a single keystroke.
  • Team scaling: Checked-in configuration means a new developer gets the full setup by cloning the repo.

The Progression

  1. Level 1: Prompting. You type good prompts and get good results. Works for individual tasks.
  2. Level 2: Configuration. You write CLAUDE.md so you don't repeat prompts. Works for a project.
  3. Level 3: Architecture. You use skills, hooks, memory, commands, and MCP servers as a system. Works for a team and scales across projects.

Most developers are at Level 1. This guide exists to get you to Level 3.

None of this is intuitive if you come from GUI tools. In Visual Studio, you configure things through dialogs and property pages. Here, you configure things through files in dot-directories with Unix-convention names. The concepts are the same (configuration, automation, persistence, extensibility) — only the packaging is different. Once you see through the packaging to the concepts, it clicks. The architecture is just software engineering applied to AI tool configuration.
Chapter 20

Bonus Features

These are real Claude Code features that don't fit neatly into the earlier chapters but are essential for power users.

15.1

Worktrees (Isolated Agent Sandboxes)

First: Two Git Terms Worth Knowing

Working tree — git's name for the folder where your actual files live. When you clone a repo, the folder you get is the working tree. Git distinguishes it from the hidden .git folder (which stores your commit history, branches, and objects) — the working tree is what you edit, the .git folder is git's internal bookkeeping. You always work in the working tree; you never touch .git directly.

Worktree — a git feature that lets you check out the same repository into multiple folders at the same time, each on its own branch. Normally, one repo means one folder with one branch checked out. A worktree creates a second (or third) folder — same git history, different branch, different files on disk — all live simultaneously. For example: C:\repos\MyApp on main and C:\repos\MyApp-experiment on a feature branch, both checked out at once from the same repo.

How Claude Uses This

When Claude spawns a subagent, it normally edits your working tree — the same files you are looking at. Worktrees change that: instead of working in your folder, the agent gets its own separate folder checked out to its own branch. It works there in complete isolation. Your files are never touched. When the agent finishes, you review its branch and decide whether to merge it or discard it.

Like branching in real life. Instead of renovating a room while you're living in it, the contractor builds the renovation in a separate identical room. When it's done, you inspect it. If it's good, you swap. If it's bad, you tear down the copy — your original room was never touched.
Like making a copy of a presentation before trying a radical redesign. You duplicate the file, make all the changes in the copy, show it to the client. If they hate it, you still have the original. If they love it, you replace the original. Worktrees are that "duplicate before experimenting" discipline — automated and guaranteed.

How to use it: when calling the Agent tool, set isolation: "worktree". The agent runs in a temporary git worktree. If it makes no changes, the worktree is cleaned up automatically. If changes are made, you get the worktree path and branch name to review.

When to use it: risky refactors, experimental changes, anything where you want a safety net. The agent can go wild without affecting your working files.

Sandboxing on Windows: Claude Code's OS-level sandboxing relies on platform primitives that are not currently supported on native Windows. On macOS it uses the Seatbelt framework; on Linux and WSL2 it uses bubblewrap. If you are running Claude Code on native Windows (not WSL2), the sandbox layer is not active. If you see unexpected user accounts appear on your Windows machine, that is not Claude Code — check other installed software, virtualization tools, or recent Windows updates.
15.2

Task Tool (Built-in Todo Tracker)

"Task" and "Todo" are used interchangeably here — but "Task" also means something completely different elsewhere in Claude Code. The todo tracker is called the "Task Tool" and its items are called tasks or todos depending on which part of the UI or docs you're reading. Separately, Claude Code has a Task tool that launches subagents — independent Claude processes that do work in parallel. That kind of "task" has nothing to do with a todo item. A subagent task can itself create todo tasks, which makes the overlap genuinely confusing. When you see the word "task," context is the only way to know which meaning is intended.

Claude Code has a built-in task/todo system that persists within a conversation. You can create tasks, mark them in-progress, mark them complete, and track what's left.

Claude can also create tasks on its own when working through complex multi-step work. It uses them to track progress, and you see the task status in the conversation.

Like a shared Trello board between you and your AI, but it lives inside the conversation. No separate app, no context switching. You say "build X" and Claude breaks it into tasks, works through them one by one, and marks each done.
Like a checklist in a project management tool that both you and your assistant can see and update. You assign "write the agenda," "book the room," "send invites" — your assistant works through them and checks them off. You can glance at any time and see exactly what's done and what's left. No "where are we on this?" emails needed.

When to use it: any multi-step task. Instead of hoping Claude remembers all 7 things you asked for, tasks make the list explicit and trackable.

The Catch: Session Tasks Don't Survive the Session

The built-in task system stores everything in session memory only — not on disk, not in any file. When the session ends, all tasks are gone. If you come back tomorrow, Claude has no memory of what was in progress.

A TODO.md file is a common alternative for persistence — but Claude does not pick it up automatically. It is just a regular markdown file. To make it work, add a line to your CLAUDE.md:

Check TODO.md at the start of each session for outstanding tasks. Update it when tasks are completed or new ones are discovered.

With that instruction in place, Claude reads TODO.md on every session and can update it as work progresses. Because it is a plain file in your repo, it gets committed to git and the whole team sees the same list.

TODO.md loaded via CLAUDE.md costs tokens every session — just like any other content in your context. If the file grows long, you are paying for all of it on every message. Keep it trimmed to current and near-term work only; archive completed items rather than leaving them in the file.

The tradeoff in plain terms: the built-in task tool is free and effortless but evaporates when you close the terminal. A TODO.md file persists and is shareable but costs tokens. Use whichever matches how long your work spans.

15.3

Model Switching Mid-Session (/model)

You don't have to pick one model for an entire session. Type /model to switch between Opus, Sonnet, and Haiku without starting over.

ModelStrengthsCostUse When
OpusMost capable, best reasoningHighestArchitecture decisions, complex refactors, debugging hard problems
SonnetBalanced speed and capabilityMediumGeneral coding, most tasks
HaikuFastest, cheapestLowestSimple edits, quick lookups, boilerplate generation
Like shifting gears. You don't drive in first gear on the highway, and you don't use sixth gear in a parking lot. Use Opus for the hard thinking, Sonnet for general work, Haiku for the quick stuff.
Like choosing which consultant to call. For a major strategic decision, you call the senior partner (expensive, slow, thorough). For a quick question, you call the junior associate (fast, cheap, good enough). You wouldn't pay partner rates to answer "what's the meeting time?" — and you wouldn't send the intern to negotiate a merger. Match the resource to the complexity of the task.

Pro tip: Start a session with Opus for planning and architecture decisions, then switch to Sonnet for implementation, and Haiku for cleanup and formatting.

15.4

/compact (Manual Context Compression)

When your context window fills up, Claude auto-compresses older messages to make room. This is lossy — details get summarized and nuance is lost. /compact lets you trigger this compression manually, on your terms.

How /compact actually works — a sincere answer

Claude summarizes the older parts of the conversation to make room for new material. It is not like pages falling off a stack where the old content is simply gone. It is more like compressing a thick folder of notes into a single summary page — the detail is reduced, but the gist is preserved. The old pages are not wiped; they are condensed.

So after /compact: the early part of your session becomes a dense summary (lower fidelity, still present). The recent part stays intact. Now there is space in the Awareness window to bring in new files, new code, new context. You have not lost everything — you have traded detail for room.

The risk: something important buried in a long tool output from two hours ago may have been dropped in the summary. That is why you can pass a hint — a short phrase telling Claude what must survive the compression at all costs.

Example: You have been debugging a payment processing bug for two hours. You have explored many dead ends. Now you are ready to write the fix, but the context window is getting full. You type:

/compact focus on the payment bug — the root cause is in checkout.js line 142 where the tax calculation overflows

Claude compresses the two hours of exploration history but keeps the critical finding — line 142 — intact. Without the hint, that specific detail might have been summarized away as "explored various files."

Another example: You have been reading through 30 files trying to understand how the login system works. You now want to make changes but need to free up context. You type:

/compact keep the authentication flow — specifically that session tokens are stored in Redis, not the database

The exploration gets compressed, the key architectural fact survives.

What this means for Awareness: your plan determines the maximum Awareness your model can have — 267 pages on Pro, 1,333 pages on Max. But as a session runs, conversation history, file reads, and tool output accumulate and eat into that space. After an hour of coding, you might have consumed 150 of your 267 pages just on history — leaving only 117 pages for the actual work ahead. /compact shrinks the history back down, reclaiming Awareness. Think of it as freeing up screen real estate so the AI can see more of your project and less of its own conversation record. You are not upgrading your plan — you are clearing clutter within the plan you have.

Why you'd want this: if you've done a lot of exploratory reading (file reads, searches) that generated verbose output you no longer need, /compact frees that space while you're still in control of what's important. If you wait for auto-compression, it happens at an unpredictable moment and may compress something you still need.

Like manually defragmenting a hard drive (remember that?). The system will do it eventually, but doing it yourself at the right time gives you better results. Or like clearing your browser tabs before your laptop runs out of RAM — better to choose which tabs to close than to let the OS kill them randomly.
Like cleaning out your inbox before starting a new project phase. You archive the old threads, keep the active ones, and start fresh with clarity. If you don't do it intentionally, the system will eventually do it for you — but it won't know which emails you still needed. /compact is that intentional cleanup, done on your schedule.

You can also pass a summary hint: /compact focus on the authentication refactor tells the compressor what to prioritize keeping.

/compact will likely deactivate any skills you loaded. When you compact a session, the skill's content — which was sitting in conversation history — gets summarized or dropped. Claude does not automatically re-read the skill file afterward. It will silently stop following the skill's instructions. You won't see an error. You'll just notice Claude is no longer behaving as the skill specified. Fix: re-invoke the skill after compacting, or use the compact hint to preserve its key rules.

Skills and Compaction — A Known Problem

Compaction can silently drop skills you loaded earlier in the session. This is a documented real-world issue. After compaction, Claude may completely forget that a skill was active — and will not automatically re-read it. It will stop following the skill's instructions without telling you. You will just notice that Claude is no longer behaving as instructed.

What actually happens:

  • You invoke a skill early in a session
  • The session runs long and compaction occurs (manually or automatically)
  • The skill's content gets summarized or dropped in the compression
  • Claude continues the session without the skill's instructions — silently

Does Claude re-load the skill automatically after compaction? No. Confirmed by GitHub issue reports. Claude does not proactively re-read skill files after compaction, even if you have instructions in CLAUDE.md telling it to do so.

What to do about it:

  • Re-invoke the skill after compaction. If you ran /my-skill at the start of the session and then compacted, type /my-skill again.
  • Use your compaction hint to preserve it. /compact keep the react-component skill rules tells the compressor to include the skill's key points in the summary.
  • Move critical rules to CLAUDE.md. CLAUDE.md reloads automatically every session and survives compaction (it's re-loaded from the file, not from conversation history). If a rule is important enough that losing it mid-session would hurt, it belongs in CLAUDE.md, not a skill.
  • Run /compact at logical breakpoints rather than letting auto-compaction trigger mid-task — you control what gets preserved.

See also: Chapter XI — The Context Window for how Awareness is consumed, and IQ vs. Awareness on the Other AI CLIs page for how your plan determines your maximum Awareness.

15.5

Permissions Model

Claude Code asks permission before doing things — editing files, running commands, making web requests. The permission system has tiers:

LevelWhat HappensHow to Set
Ask every timeClaude pauses and asks before each actionDefault behavior
Allow onceYou approve one specific actionPress Enter or 'y' at the prompt
Allow for sessionAll actions of that type are auto-approved for this sessionPress 'a' (allow) at the prompt
Allowlist (permanent)Specific tools/commands are always approvedConfigure in settings.json
Like Windows UAC (User Account Control). By default, every elevated action triggers a prompt. You can configure trusted applications to skip the prompt. The allowlist in settings.json is like marking an app as "always run as administrator."
Like the approval levels in an expense policy. Anything under $50 you can approve yourself. Between $50-500 needs your manager. Over $500 needs the VP. You configure the rules once; the system enforces them automatically. Claude's permission tiers work the same way — you decide in advance what requires your sign-off and what can proceed automatically.

The settings.json allowlist lets you pre-approve specific tools and bash command patterns so Claude doesn't interrupt your flow for routine operations.

Bypass Permissions — And Why to Be Careful

Claude Code has a dangerouslySkipPermissions flag that disables all permission prompts entirely. You can invoke it two ways:

# Command line flag
claude --dangerously-skip-permissions

# Or with a prompt inline
claude --dangerously-skip-permissions -p "your prompt here"

It can also be set in settings.json:

{
  "dangerouslySkipPermissions": true
}

Either way, every file edit, every bash command, every tool call proceeds without asking. No pauses, no confirmations.

The intended use case is fully automated pipelines — CI environments, scripted agents, headless runs where there is no human at the keyboard to approve anything. In that context it is a reasonable tool.

In interactive use, it is a loaded gun with the safety off.

If you run your terminal with elevated privileges (Run as Administrator on Windows, sudo on Mac/Linux), bypassing permissions means Claude can execute anything on your machine with full admin rights — deleting system files, modifying the registry, killing processes, overwriting configs — without a single confirmation prompt. Claude will not refuse because it has been told not to ask. Mistakes at that level are not easily undone. A misunderstood instruction, a hallucinated file path, an overly aggressive cleanup task — any of these become immediate and irreversible.

Even without elevated rights, bypassing permissions removes your last line of defence against Claude misunderstanding what you asked. The permission prompts exist precisely to catch the gap between what you said and what Claude heard.

Recommendation: Never use dangerouslySkipPermissions in an interactive session. Never run Claude in an admin terminal unless you have a specific reason to. If you do need to run an automated pipeline with skipped permissions, do it in a sandboxed environment — a container, a VM, a dedicated low-privilege service account — not your main development machine.
15.6

settings.json and settings.local.json

These are Claude Code's configuration files. They control much more than MCP servers:

  • Permissions — which tools and commands are auto-approved
  • Environment variables — passed to Claude's environment
  • Allowed/denied tools — restrict which tools Claude can use
  • Model preferences — default model for new sessions
  • MCP servers — plugin configuration (covered in Chapter X)
  • Theme — light/dark mode preference

Scope:

  • C:\Users\YourName\.claude\settings.json — personal, applies to all projects, never in git
  • C:\repos\MyProject\.claude\settings.json — project-level, can be committed to git for team settings
  • C:\repos\MyProject\.claude\settings.local.json — personal project overrides, NOT committed (like .env.local)
Like the Visual Studio settings hierarchy: Tools > Options (user-level), .editorconfig (repo-level shared), .vs/ directory (repo-level personal). Each layer overrides the one above it.
15.7

--resume and --continue

You don't have to start a new session every time. You can pick up where you left off:

  • claude --resume — shows a list of recent sessions to choose from
  • claude --resume <session-id> — resumes a specific session by its UUID
  • claude --continue — resumes the most recent session in the current directory

Session IDs are UUIDs (like dd8ab84a-a52e-467b-bcb6-0ad3a44a5db6). You can find them in C:\Users\YourName\.claude\projects\ or by using a session manager.

Like reopening a saved game. --continue is "load last save." --resume <id> is "load specific save file." The conversation history, context, and any in-progress work are all restored.
Like reopening a shared document you were working on with a colleague last week. Everything is still there — your comments, their edits, the tracked changes, the notes in the margin. You don't start over. You continue from where you left off. The session ID is just the document's file name.

How Much Do These Actually Add, Given CLAUDE.md and Compaction?

If your CLAUDE.md is well-maintained, you might wonder what --continue really adds. The distinction is: CLAUDE.md carries instructions; session history carries conversation state. CLAUDE.md tells Claude how to work. Session history tells it what you were doing — "we were mid-way through debugging the auth middleware, you had proposed three approaches, I'd rejected two of them." That context lives in the conversation, not in any file.

--resume adds one more thing: the ability to return to a specific past thread rather than just the most recent one, useful when you've been working on multiple things in parallel.

Compaction erodes the value of both flags. Once a session has been compacted, you are not resuming the original conversation — you are resuming Claude's compressed summary of it. The further back and the more heavily compacted the session, the less you recover. For a short session interrupted mid-task (you closed the terminal, stepped away, came back), --continue is genuinely useful — the conversation is still intact. For a long session that ran through multiple compaction cycles, resuming gives you little more than you would get from a fresh start with good CLAUDE.md notes. The practical takeaway: use --continue for same-day interruptions. For anything older, write good session-end memory notes instead and start fresh.
15.8

Cost Tracking

Every message costs money. Claude Code tracks token usage so you can monitor spend.

  • /cost — shows token usage and estimated cost for the current session

Why Claude Shows Two Prices: "$3/$15"

When you select a model in Claude Code, you see prices like $3/$15 per Mtok. That slash is not a range — it is two separate prices for two separate things:

  • First number ($3) — what you pay per million tokens you send: your prompts, files Claude reads, tool results coming back
  • Second number ($15) — what you pay per million tokens Claude generates: its responses, code it writes, explanations

Output costs 5× more than input because generating text is computationally heavier than reading it. A real session example:

ModelYou send 100K tokensClaude responds 20K tokensSession total
Sonnet ($3/$15)$0.30$0.30$0.60
Haiku ($1/$5)$0.10$0.10$0.20
Opus ($15/$75)$1.50$1.50$3.00

On a subscription plan (Pro, Max), you don't pay these rates per session. Your monthly fee covers a message budget. Claude Code shows you the rates as a transparency feature — so you know the cost equivalent of what you're using, even on a flat plan. If you are on the API (pay-per-token), these rates are your actual bill.

Token Types

Token TypeWhat It IsCost (Sonnet)
Input tokensWhat you send (prompts, file contents, tool results)$3.00 / 1M
Output tokensWhat Claude generates (responses, code, tool calls)$15.00 / 1M
Cache write tokensFirst time content enters the cache$3.75 / 1M
Cache read tokensSubsequent reads of cached content (much cheaper)$0.30 / 1M

Cache reads are 10x cheaper than fresh input. This is why CLAUDE.md and skills are cost-efficient — they get cached after the first message, so subsequent messages read them from cache at $0.30/1M instead of $3.00/1M.

Like a metered API. Every call costs something. Input is cheap, output is expensive (5x more). Caching is like a CDN — first request is full price, subsequent requests are pennies.
Like a phone plan with per-minute charges. Receiving a call costs less than making one. And if you call a number you've called before, the rate drops because it's cached in your contacts. You pay attention to your bill. Token costs work the same way — you can see where the money goes and optimize accordingly.
15.9

The CLAUDE.md Inheritance Chain

There isn't just one CLAUDE.md. There's a stack, and they all get loaded and composed:

  1. Your personal global CLAUDE.mdC:\Users\YourName\.claude\CLAUDE.md. Applies to every session on your machine, across all projects. Your universal preferences.
  2. The project CLAUDE.mdC:\repos\MyProject\CLAUDE.md (repo root). Checked into git, shared with your team. Project-specific rules and conventions.
  3. Subdirectory CLAUDE.md files — e.g., C:\repos\MyProject\src\frontend\CLAUDE.md. Loaded when Claude is working in that subdirectory. Useful for monorepos where different directories have different conventions.

They stack — all matching files are loaded, not just the nearest one. Global + repo root + subdirectory all contribute to the instructions Claude sees.

Like CSS specificity. Global styles (user stylesheet) apply everywhere. Page styles (repo CLAUDE.md) override for this project. Inline styles (subdirectory CLAUDE.md) override for this specific area. They cascade — more specific rules win when they conflict, but all layers are present.
Like corporate policy vs team policy vs project guidelines. Company-wide: "All client communications must be approved before sending." Team-level: "Our team uses a specific email template for proposals." Project-level: "On this particular account, always copy the account director." All three are active at once. The most specific one wins when they conflict.
Don't go overboard. Every CLAUDE.md consumes context window tokens on every message. A rough estimate: 800 lines of markdown costs about 12,000 tokens (at ~15 tokens per line). On a 200K Pro window that is roughly 6% of your entire context budget, gone before you type a word. On a 1M Max/Enterprise window it drops to about 1.2% — far less painful, but it still never leaves context for the entire session. Keep each file focused and concise.
15.10

Extended Thinking

Sometimes Claude pauses for 10-30 seconds before responding. This isn't lag — it's extended thinking, a mode where Claude does deeper reasoning before generating a response.

What Happens

Claude generates internal "thinking" tokens that you don't see. These are reasoning steps: analyzing the problem, considering approaches, checking constraints. Only the final answer appears in your conversation.

When It Activates

  • Complex architectural decisions
  • Multi-step reasoning problems
  • When Claude needs to reconcile conflicting requirements
  • Large codebase analysis

Cost Implications

Thinking tokens are output tokens — the most expensive kind. A 30-second thinking pause might generate thousands of internal tokens you never see but still pay for. This is why Opus with extended thinking costs significantly more than Haiku for simple tasks.

Like a chess computer thinking ahead. You see the final move, not the millions of positions it evaluated. The evaluation takes time and compute (money), but the result is a much better move than an instant reaction would produce.
Like a consultant who goes quiet for a day before sending their recommendation. You asked for their analysis on Monday, and they didn't respond until Tuesday. That pause wasn't procrastination — they were working through the problem carefully. The extended thinking pause in Claude is the same. The response will be better because it wasn't rushed.

Practical tip: If you're doing simple tasks (rename a variable, add a comment), use Haiku — it doesn't need to think deeply. Save Opus + extended thinking for the problems that actually require deep reasoning.

Pricing & Purchasing

Prices accurate as of March 2026

Honest warning: Anthropic's pricing page is genuinely difficult to understand. It is not your fault. The same concept — how many tokens you consume — gets called "usage," "API usage," "tokens," and "context" depending on which section you are reading. Enterprise pricing is not published. We have decoded it below as plainly as we can. If something still doesn't make sense, that's the page's problem, not yours. See the Decoder Ring in the Glossary tab for a term-by-term translation.
Step 1: How to Get Set Up
  1. Go to claude.ai and create a free account with your email.
  2. Install Claude Code: winget install Anthropic.ClaudeCode or download from claude.ai/download
  3. Run claude in a terminal. It will prompt you to log in — use your claude.ai account.
  4. You're on the free tier. Try it. If you hit limits, upgrade.

You do not need a credit card to start. The free tier is real — it's just rate-limited.

The Plans

Snapshot as of March 2026. Verify at claude.ai/upgrade before relying on fine details.

PlanMonthly CostUsage LimitBest For
Free $0 ~10-15 messages per 5-hour window Trying it out. Not for real work.
Pro $20/month ($17 annual) ~45 messages per 5-hour window Developers coding 2-3 hours/day. Light-to-moderate use.
Max 5× $100/month ~225 messages per 5-hour window Developers coding 6-8 hours/day. Full workday without hitting walls.
Max 20× $200/month ~900 messages per 5-hour window Power users, large projects, all-day heavy coding sessions.
Team $25-30/user/month (standard)
$150/user/month (premium with Claude Code)
Similar to Pro per seat Organizations deploying Claude Code to a team. Centralized billing.
Enterprise Anthropic doesn't publish this. Community reports and enterprise buyers say:

~$40–$60/seat/month (standard seats)
~$100–$150/seat/month (premium seats with Claude Code)
Minimum ~20 seats, annual contract
Small deployment (10–25 users): ~$500–$1,000/month
Large org (100+ users): $5,000–$15,000+/month

Plus API usage billed separately on top. Contact Anthropic sales for a real quote.
Custom — API usage billed on top of seat fees at standard per-token rates Organizations only. Audit logs, SSO, SCIM, RBAC, 500K context window, HIPAA option. NOT for individuals — minimum seat commitment required.

Important: All Claude surfaces (claude.ai website, Claude Desktop, Claude Code CLI) share the same usage pool. If you burn through messages on the website, you have fewer for coding.

Can an individual buy Enterprise to get unlimited coding? No. Enterprise requires a minimum seat commitment and an annual contract with sales. It's organizational infrastructure — not just more usage. If you code like a madman and want maximum throughput, Max 20× ($200/month) is your ceiling as an individual. Enterprise usage is also billed separately at API rates on top of the seat fee — it's not "unlimited."
Real Questions, Real Numbers

"Our company has 60 people. What's the monthly bill?"
At the community-reported ~$60/seat/month for standard seats: 60 × $60 = $3,600/month in seat fees, plus API usage (token costs) on top. If you're deploying premium seats with Claude Code access (~$150/seat), it's 60 × $150 = $9,000/month before usage. Budget $5,000–$15,000/month all-in for a 60-person shop doing active AI coding. Annual commitment, so expect to negotiate. These are community estimates — Anthropic's actual quote could be higher or lower.

"Our company has 3 people. Can we get Enterprise?"
Probably not at standard terms. Reports suggest a minimum floor of 20 seats. However: you can sometimes buy up — paying for 20 seats even if only 3 people use them. That would cost ~$1,200–$3,000/month for 3 actual users, which is absurd value math. For a 3-person team, Team plan ($150/user/month for premium Claude Code seats = $450/month) is almost certainly the right answer. You get Claude Code access without the enterprise overhead and minimum commitment.

"What if I'm willing to pay as if we had 70 people just to get Enterprise features?"
Call Anthropic sales and ask directly. Some vendors will take your money. You'd be looking at 70 × $60 = $4,200+/month for features a 3-person team doesn't need (SSO, SCIM, audit logs are organizational plumbing). The honest answer: don't. Max 20× ($200/month) gets you more coding throughput per dollar than any Enterprise arrangement would for a solo or small team.

What Does Enterprise Actually Get You?

Two categories — one genuinely useful to anyone working on large projects, one only useful to organizations managing many people. Don't conflate them.

Benefit What it actually means Useful to individuals? Useful to organizations?
More Awareness
("larger context window")
1M token window (~1,333 pages) instead of 200K (~267 pages). Claude can hold 5× more of your codebase in mind at once. Same model, bigger reading window. This is the real coding benefit. Yes — for large codebases Yes — for large codebases
Audit logging Every prompt and response is logged for compliance review No Yes — legal/compliance
SSO / SAML Employees log in with their corporate identity — no separate Anthropic accounts No Yes — IT management
SCIM Auto-add/remove users when employees join or leave via HR system No Yes — HR/IT automation
RBAC Control which employees can use which models and features No Yes — access control
Custom data retention Control how long conversation data is stored and where No Yes — compliance/legal
Priority support Actual humans to call when something breaks Maybe Yes — uptime-critical deployments
The Awareness benefit is real — but you do NOT need Enterprise to get it. This is validated by Anthropic's own March 2026 announcement and confirmed by developer communities. Max ($100–$200/month) and Enterprise have identical published 1M context windows as of March 2026. There is one caveat: Enterprise customers who negotiate custom contracts may be able to arrange context windows beyond 1M through Anthropic's sales team — but this is not a published, standard feature. If you saw something online claiming Enterprise gets more Awareness, it may have been written before March 2026 when Enterprise had 500K and Max only had 200K. That gap no longer exists at the published tier level. It gives you the same Awareness plus governance overhead (SSO, SCIM, audit logs) plus a minimum seat commitment plus a sales negotiation. If you are an individual developer working on a large codebase, Max 20× ($200/month) is your answer — same context as Enterprise, no sales call, no minimum seats, no contract. Enterprise is for organizations that need compliance controls. Max is for developers who need maximum Awareness.
"Larger context window — access to extended context models" — Decoded

(Anthropic's words, translated into something honest)

PlanAwareness (pages)Awareness (tokens)Cost/monthHow
Pro~267 pages200K$20Default
Pro (unlocked)~1,333 pages1M$20Type /extra-usage — not automatic
Max 5×~1,333 pages1M$100Automatic
Max 20×~1,333 pages1M$200Automatic + 20× message budget
Team / Enterprise~1,333 pages1M$150+/userAutomatic + org governance

"Extended context model" is not a different AI. It is the same Opus 4.6 or Sonnet 4.6, with the reading window expanded from 267 pages to 1,333 pages. Same IQ. Five times more Awareness. The difference between seeing a chapter and seeing the whole book.

Real-world referencePagesFits in Pro (267 pages)?Fits in Max (1,333 pages)?
A short novel (The Great Gatsby)~180 pagesYesYes
A typical technical book (Clean Code)~460 pagesNo — too bigYes
A small codebase (10 files × 200 lines)~40 pagesYes, easilyYes
A medium codebase (100 files × 200 lines)~400 pagesNo — too bigYes
A large codebase (500 files × 200 lines)~2,000 pagesNoNo — Claude sees roughly the first third
War and Peace~1,225 pagesNoYes — barely
The entire Lord of the Rings trilogy~1,500 pagesNoNo — 167 pages short

A medium project — say 100 source files — is already beyond what Pro can hold at once. Max fits it comfortably. For very large codebases, neither plan holds everything; you manage context deliberately with /compact and targeted file reads rather than loading the whole project.

Tokens: Your Fuel (and What "Usage" Actually Means)

Tokens are like gas. You can measure gas two ways: by how far it gets you (miles per gallon) or by what it costs per gallon. Both are useful. Tokens work the same way:

When you run out of tokens in a window, Claude slows down or stops until the window resets (every 5 hours). That's the tank hitting empty.

Token TypeWhat It IsCost (Sonnet 4.6)
Input tokensWhat you send: your prompt, files Claude reads, tool results$3.00 per million
Output tokensWhat Claude generates: responses, code it writes$15.00 per million
Cache writeFirst time repeated content (CLAUDE.md, skills) is cached$3.75 per million
Cache readSubsequent reads of cached content — much cheaper$0.30 per million

Two Ways Claude Charges for Tokens — and Why It Matters

There are two completely different ways to pay for Claude, and they get confused constantly:

Way 1: Subscription (Pro, Max) — You pay a flat monthly fee. Tokens are included inside your message budget. You don't see a token bill. You just hit a wall when you've sent too many messages in a 5-hour window, then wait for it to reset. This is what most people use. Simple.

Way 2: API / Pay-per-token — You get an API key from Anthropic and pay directly for every token consumed, with no subscription. The per-token rates in the table above are what you pay. This is for developers building applications that use Claude — a company baking Claude into their own product, not a developer using Claude as a coding tool. You'd wire your app to Claude's API and get billed for token consumption each month.

Enterprise uses Way 2 on top of a seat fee. The seat fee (~$60–$150/user/month) buys access and organizational features (audit logs, SSO, etc.). But every message your team sends still burns tokens, billed at per-token rates on top of that seat fee. That's what "token consumption billed separately" means in the Enterprise section — it's the same tokens as above, just billed directly instead of wrapped in a message budget.

If you're a developer using Claude Code to write code: use a subscription (Pro or Max). You never need to think about API keys or per-token rates. Those are for building products, not for using Claude as a tool.

How Many Hours Can You Code? Real Scenarios.
ScenarioDaily coding hoursPlan neededMonthly cost
Curious experimenter
Trying Claude Code for the first time, occasional small tasks
30 min–1 hour Free or Pro $0–$20
Part-time coder
HR professional automating reports, PM writing scripts, UX designer prototyping
1–2 hours Pro $20
Working developer
Full-time developer using Claude Code as primary coding tool
3–5 hours Pro (will hit limits occasionally) or Max 5× $20–$100
Heavy developer
All-day Claude Code, large codebases, frequent file reads
6–8 hours Max 5× $100
Power user
Running agents, parallel sessions, very large projects
8+ hours Max 20× $200
The average developer on Claude Code spends about $6/day in token costs — which maps to the Pro plan ($20/month ÷ 30 days = $0.67/day subscription equivalent, but token costs stack). 90% of users stay under $12/day. If you're consistently hitting Pro limits mid-morning, upgrade to Max 5×. If you've never hit a limit, Pro is fine.
IQ vs. Awareness: What You're Actually Buying

Think of it like fidelity in images. A single photograph can be ultra-high-resolution — every pixel tiny, every detail captured. That's high IQ. A movie has thousands of frames, covering far more ground — but each individual frame may be lower resolution than a still photo. That's high Awareness.

You want both — but they are separate dials. When you pay more for AI, you might be buying a sharper photograph (better reasoning per frame), more frames (larger context window), or both. A high-IQ model on a small plan sees 267 pages with brilliant resolution. The same model on a Max plan sees 1,333 pages — five times the movie, same sharpness per frame.

Editorial teaching model — not benchmark scores. IQ numbers are estimated shorthand for relative reasoning quality (Opus 4.6 = 100 baseline), not official measurements. Awareness numbers are calculated from published token limits and are factual.   IQ = editorial estimate   Awareness = official fact

Save Money Without Losing Productivity

Life of a Token

You type a message. Claude responds. What actually happened? The answer crosses your keyboard, your network card, the public internet, a data center, dozens of GPU chips, and back again. The same piece of data gets called something different at every stop. This page traces the complete journey.

The Complete Journey
Your Machine Internet GPU Cluster (Data Center) You type a message 1 Tokenization Text split into token IDs. "Hello world" → [9906, 1917]. Done locally in milliseconds. [token = text unit, meaning #1] Tokens travel to Anthropic 2 HTTPS Request → Token IDs + conversation history as JSON/TLS. A few KB to Anthropic’s load balancer. Inside the data center 3 Route to GPU Pool → Opus → Opus pool. Sonnet → Sonnet pool. Models run on separate hardware. Prefill Phase 4 ALL input tokens in parallel → KV cache in GPU VRAM. Fast. This is what you pay $3/M input for. [token = KV cache slot, meaning #3] Generation Phase 5 ONE token at a time. Full forward pass through all model layers, reading the entire KV cache. Cannot parallelize. $15/M output. [token = billing unit #2 + KV slot #3] Response streams back 6 Streaming Response Each token sent as generated. This is why Claude streams word by word. [token = text unit, meaning #1] 7 De-tokenization Token IDs → text on screen. Words appear.
What the Token Is Called at Each Step
StageWhat it is calledWhere it livesSize
On your screenCharacters / textYour RAM / displayBytes
After tokenizationToken ID (an integer, 0 to ~100K)Your machine, briefly4 bytes per token
In transitJSON payloadNetwork packets (TLS encrypted)KB total
Being processed (input)Input token — prefill phaseGPU compute cores, processed in parallelBatch
Stored on GPUKV cache entryGPU VRAM on Anthropic's servers64 KB – 0.5 MB per token (range by architecture: ~64–200 KB for modern GQA models; ~0.5 MB for older MHA models)
Being generated (output)Output token — generation phaseGPU compute cores, one at a timeSequential
Streaming back to youStreamed tokenNetwork packetsBytes
Reused from prior callCache token (prompt cache hit)GPU VRAM retained between calls$0.30/M instead of $3/M
Why Output Tokens Cost 5x More Than Input Tokens
◈ The Physics of Why

Input tokens are processed in parallel. When you send 10,000 tokens, the GPU processes all 10,000 simultaneously in the prefill phase. One big matrix multiplication. Fast. Efficient. $3/M on Sonnet.

Output tokens are generated sequentially, one at a time. To generate the next word, the model runs a complete forward pass through every single layer (80+ for a large model), reading the entire KV cache on each pass. It cannot generate word 2 until word 1 is finished, because word 2 depends on word 1. This is called autoregressive generation.

One output token equals one full model forward pass, which equals approximately 5x the GPU compute of processing one input token in a batch. Hence $15/M output vs $3/M input. The 5:1 ratio is not arbitrary. It is physics.

Like the difference between delivering a load of bricks on a truck (all at once, parallel) versus a single bricklayer who must place each brick one at a time, in order. The truck drops the whole load in one trip. The bricklayer cannot place brick 2 until brick 1 is down. The model receives your input like the truck and generates its response like the bricklayer.
How Long Does a Token Stay in GPU VRAM?

The KV cache for your session lives in GPU VRAM for the entire duration of your session. Every token you have sent and every token Claude has generated stays in VRAM as the KV cache. That is how Claude remembers what you discussed 100 messages ago. It is all right there in GPU memory on a server in a data center.

Why Nvidia and Why Data Centers
◈ The Hardware Reality

The KV cache math explains Nvidia's $3 trillion valuation and why AI companies are spending hundreds of billions on data centers.

Why Nvidia specifically: H100 and H200 GPUs have 80 to 96 GB of HBM (High Bandwidth Memory) each. A 1M token session needs 1 to 2 TB of KV cache, which requires 12 to 25 H100s connected via NVLink — Nvidia's GPU-to-GPU interconnect running at 900 GB/s. AMD has competing GPU specs but lacks CUDA, the programming model that 15 years of AI frameworks are built on. You cannot easily swap Nvidia for AMD in a running production cluster.

Why data centers: A single Pro-plan Claude session ties up around 100 GB of GPU VRAM for its KV cache at median estimates (range: tens to hundreds of GB). Anthropic serves millions of sessions simultaneously. Millions of sessions times tens-to-hundreds of GB each equals exabytes of GPU VRAM required. Impossible without massive purpose-built facilities with specialized power, cooling, and GPU interconnect networking. Data centers do double duty: holding the model weights (terabytes per model, always loaded) and the KV caches for all live sessions (constantly changing as sessions start and end).

When you see news about Microsoft spending $80 billion on AI data centers, or Meta ordering 350,000 H100 GPUs, this is why. Every concurrent AI session in the world needs its own slice of GPU VRAM for its KV cache. The world is running out of VRAM.
Why Opus Was Unavailable When Sonnet Was Not

Different models run on different hardware pools. Opus 4.6 is a larger model than Sonnet 4.6. Larger model means:

Anthropic allocates separate GPU capacity per model. The Opus pool fills up first during peak demand because it is smaller. Fewer Opus GPUs exist because fewer users need it and it costs more per session to serve. When you saw "Opus unavailable," every GPU in the Opus cluster was already holding someone else's KV cache. The cluster was physically full.

Sonnet had capacity because more Sonnet hardware exists and it is more efficiently served. The model you chose directly determined which hardware cluster handled your request, and whether that cluster had room.

Higher-tier plans get priority in the queue when a GPU cluster is under load. Max and Enterprise do not just mean more tokens. They mean you are higher priority for access to the premium model hardware pools when demand exceeds capacity.
Token Scale: Mileage and Cost Chart

These tables answer: how much RAM, how many files, how much money, and how much working time?

Assumptions: 0.5 MB GPU VRAM per token (median). Average source file ≈ 1,500 tokens (300 lines × 5 tokens/line). Blended API cost ≈ $5/M tokens (80% input at $3/M + 20% output at $15/M). Active AI coding session ≈ 50,000 tokens/hour (15–20 exchanges/hour at ~3,000 tokens each).

Editorial note: These tables use means and averages, not precise model-specific values. Numbers are round figures intended to convey magnitude, not precision. Actual values vary significantly by model architecture, usage pattern, and Anthropic's infrastructure choices.

Tokens Server GPU VRAM (KV cache) Equivalent source files Pages of text API cost (blended)
1 token0.5 MB1 word0.001 pages$0.000005
1,000 tokens0.5 GB~0.7 files1.3 pages$0.005
10,000 tokens5 GB~7 files13 pages$0.05
50,000 tokens25 GB~33 files67 pages$0.25
100,000 tokens50 GB~67 files133 pages$0.50
200,000 tokens
(Pro plan max)
100 GB~133 files267 pages$1.00
500,000 tokens250 GB~333 files667 pages$2.50
1,000,000 tokens
(Max/Enterprise)
500 GB~667 files1,333 pages$5.00
Your Workday Token Budget

Assumption: you use AI heavily all day. Active AI coding = ~50,000 tokens per hour (about 15 exchanges per hour at ~3,000 tokens each — prompt + response). An 8-hour day = ~400,000 tokens consumed.

Editorial note: These tables use means and averages, not precise model-specific values. Numbers are round figures intended to convey magnitude, not precision. Actual values vary significantly by model architecture, usage pattern, and Anthropic's infrastructure choices.

Token amount Time at your pace (50K/hr) What it feels like API cost Pro plan ($20/mo) usage
10,000~12 minutesA quick focused task. 4–5 back-and-forth exchanges.$0.05~4% of daily budget
50,000~1 hourA solid morning session on one feature.$0.25~20% of daily budget
100,000~2 hoursHalf a working day of active AI coding.$0.50~40% of daily budget
200,000~4 hoursPro plan max context. A full half-day session before needing /compact or a new session.$1.00Pro plan limit hit
400,000~8 hours (full day)Your typical full working day of AI-assisted development.$2.00Requires 2 Pro sessions or Max plan
1,000,000~20 hours (~2.5 days)Max/Enterprise context limit. A major feature or week-long sprint worth of context.$5.00Requires Max plan
The math that surprises most people: at your pace (50K tokens/hour), you can burn through the entire Pro plan context window (200K tokens) in about 4 hours of active work. That is why heavy users hit limits before lunch. Max 5× ($100/month) gives you ~5 days of context at that rate before hitting the wall. Max 20× ($200/month) covers your full active workday without rationing. These plans are not priced arbitrarily — the pricing reflects real GPU time consumed.
Why This Also Explains Nvidia's Valuation

Your single full workday of AI coding (400K tokens) ties up roughly 200 GB of Anthropic's GPU VRAM for the duration of active sessions. Multiply by millions of developers worldwide doing the same thing simultaneously, and the scale becomes clear:

This is why Microsoft announced $80 billion in AI data center investment for 2026 alone. The numbers make sense when you trace a single developer's workday back to server hardware.

Estimated time: 8-10 minutes | 12 questions
Before you quiz: the word "token" has four different meanings in AI discussions, and only three of them are related.

1. Text unit — A chunk of text the model reads and writes. In English, a rough rule of thumb is about 4 characters or ¾ of a word, though the real number varies by language and formatting. 100 words ≈ about 130 tokens as a rough estimate.
2. Billing unit — The unit AI companies charge for. Input and output tokens are often priced separately, because processing them can impose different costs. On Sonnet: $3/MTok input, $15/MTok output.
3. Inference / KV-cache position — During generation, each token in the active context contributes to memory use inside the model’s attention machinery, commonly described in terms of the KV cache. The memory cost per token depends on model architecture — a rough estimate is ~64 KB to ~0.5 MB per token. This is one reason large context windows are expensive to serve. The raw text itself is tiny; what matters is the model’s internal mathematical representation of that text.
4. Security credential — A string used to prove identity to an API, such as a Personal Access Token (PAT), API key, or JWT. This meaning is completely separate from the other three.

Quiz questions may use "token" in any of these senses. Pay attention to context.
Tokens Quiz
Score: 0 / 0
0/0
Questions Correct

Advanced: Building MCP Servers

This page is for developers who want to build their own MCP servers. If you just want to use existing ones, you don't need this. But if you've ever thought "I wish Claude could access my company's internal API" or "I want Claude to query our private database" — this is how you do that. Building an MCP server is the difference between using Claude as a general-purpose tool and making it a specialist that knows your systems.
Why Would You Build One?
The Contract: What Your Program Must Do

There is no interface file, no IDL, no .h header. The "contract" is the MCP protocol — a set of JSON-RPC 2.0 messages your program must understand. When Claude Code connects to your server:

  1. Claude sends initialize — handshake and capability negotiation. Your program responds with what it supports.
  2. Claude sends tools/list — "what tools do you expose?" Your program returns a list of tool definitions (name, description, JSON Schema for parameters). Claude caches this for the session.
  3. Claude sends tools/call — "call this tool with these arguments." Your program executes the action and returns the result as JSON.

That's the entire protocol for basic tool exposure. Three message types. The rest is your business logic.

The Secret Sauce: Tool Definitions

The quality of your tool definitions determines how well Claude uses your server. Claude reads the description to understand what the tool does and decides when to call it. The JSON Schema tells it what parameters to provide. Get the description wrong and Claude will misuse or ignore your tool.

{
  "name": "create_ticket",
  "description": "Creates a new Jira issue in the specified project. Use when the user wants to file a bug, create a task, or track work in Jira. Returns the new issue key (e.g. PROJ-1234).",
  "inputSchema": {
    "type": "object",
    "properties": {
      "summary": {
        "type": "string",
        "description": "The issue title — one sentence, specific"
      },
      "priority": {
        "type": "string",
        "enum": ["Low", "Medium", "High", "Critical"],
        "description": "Issue priority"
      },
      "project_key": {
        "type": "string",
        "description": "Jira project key (e.g. PROJ, INFRA). Ask the user if unknown."
      }
    },
    "required": ["summary", "project_key"]
  }
}
Message Sequence: What Actually Flows Between Claude and Your Server
Claude Code
Your MCP Program
──────────────────────────→ initialize { protocolVersion, capabilities }
←────────────────────────── initialize result { serverInfo, capabilities }
— Handshake complete. Claude now asks what you can do. —
──────────────────────────→ tools/list {}
←────────────────────────── tools/list result [ { name, description, inputSchema }, ... ]
— Claude caches your tool list. Session continues. User asks something. —
──────────────────────────→ tools/call { name: "create_ticket", arguments: { summary: "...", project_key: "PROJ" } }
←────────────────────────── tools/call result { content: [ { type: "text", text: "Created PROJ-1234" } ] }
— Claude shows the result to the user. More tool calls may follow. —
The Raw Protocol — Real JSON Examples

Every message is JSON-RPC 2.0 over stdin/stdout. Here is exactly what flows through the pipe:

Claude sends to your program (stdin):

{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"claude-code","version":"2.1.75"}}}

{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}

{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"create_ticket","arguments":{"summary":"Login button broken on Safari","project_key":"PROJ","priority":"High"}}}

Your program responds (stdout):

{"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2024-11-05","capabilities":{"tools":{}},"serverInfo":{"name":"my-jira-server","version":"1.0.0"}}}

{"jsonrpc":"2.0","id":2,"result":{"tools":[{"name":"create_ticket","description":"Creates a Jira issue. Use when user wants to file a bug or track work.","inputSchema":{"type":"object","properties":{"summary":{"type":"string"},"project_key":{"type":"string"},"priority":{"type":"string","enum":["Low","Medium","High","Critical"]}},"required":["summary","project_key"]}}]}}

{"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"Created PROJ-1234: Login button broken on Safari\nhttps://yourjira.atlassian.net/browse/PROJ-1234"}]}}

Each message is one line of JSON terminated by a newline. Your program reads lines from stdin, parses the JSON, acts on the method, and writes a response line to stdout. That is the entire transport layer.

The description IS the interface. Claude does not read your source code. It reads your tool description and decides when and how to call the tool based on that text alone. Write descriptions the way you'd write a function's docstring for a junior developer — explicit, specific, including when to use it and what it returns.
Hello World in C# — For the Windows Developer

Since this guide is written for Windows developers, here's the simplest possible MCP server in C# using the official ModelContextProtocol NuGet package. This is a real, working implementation:

// 1. Create a new console app: dotnet new console -n MyMcpServer
// 2. Add the package: dotnet add package ModelContextProtocol
// 3. Add to Program.cs:

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using ModelContextProtocol.Server;

var builder = Host.CreateApplicationBuilder(args);

// CRITICAL: Log to stderr, NOT stdout — stdout is the protocol channel
builder.Logging.AddFilter("*", LogLevel.Warning);

builder.Services.AddMcpServer()
    .WithStdioServerTransport()   // reads stdin, writes stdout
    .WithToolsFromAssembly();     // discovers tools via [McpServerTool] attribute

await builder.Build().RunAsync();
// MyTools.cs — define your tools here
using ModelContextProtocol.Server;
using System.ComponentModel;

[McpServerToolType]
public static class MyTools
{
    [McpServerTool, Description("Says hello. Use when user wants a greeting.")]
    public static string SayHello(
        [Description("The name to greet")] string name)
    {
        return $"Hello, {name}! This response came from your C# MCP server.";
    }

    [McpServerTool, Description("Adds two numbers. Use for arithmetic.")]
    public static int Add(
        [Description("First number")] int a,
        [Description("Second number")] int b)
    {
        return a + b;
    }
}
// Configure in Claude Code ~/.claude/settings.json:
{
  "mcpServers": {
    "my-csharp-server": {
      "command": "dotnet",
      "args": ["run", "--project", "C:\\MyMcpServer\\MyMcpServer.csproj"]
    }
  }
}

Restart Claude Code, and Claude will discover SayHello and Add as tools it can call. Ask Claude "say hello to Steve" and it will call your C# method directly.

The logging trap in C#. By default .NET logs to stdout — which corrupts the JSON-RPC protocol stream. Always configure logging to stderr (LogToStandardErrorThreshold) or set log levels to Warning/Error only. This is the #1 issue when first-time C# MCP developers can't get their server to respond.
Implementation: Any Language, Any Executable

Any program in any language that can read stdin and write stdout is a valid MCP server. You are not limited to npm packages. A compiled Rust binary, a Python script, a Go executable, a PowerShell script — all work. Anthropic provides SDKs for TypeScript and Python that handle the protocol boilerplate, but they're optional.

Never write anything to stdout except JSON-RPC responses. Stdout is the protocol channel. A single console.log(), print(), or debug statement corrupts the message stream and breaks your server silently. Claude will hang or report tool errors with no explanation. Write all logs and debug output to a file or stderr.

The minimal Python MCP server is about 30–50 lines:

# Install: pip install mcp
from mcp.server import Server
from mcp.server.stdio import stdio_server
import mcp.types as types

app = Server("my-server")

@app.list_tools()
async def list_tools():
    return [
        types.Tool(
            name="hello",
            description="Says hello to someone",
            inputSchema={
                "type": "object",
                "properties": {"name": {"type": "string"}},
                "required": ["name"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "hello":
        return [types.TextContent(type="text", text=f"Hello, {arguments['name']}!")]

async def main():
    async with stdio_server() as (read, write):
        await app.run(read, write, app.create_initialization_options())

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())
MCP Server vs. CLI Program — Which Should You Build?

This is the right question to ask. You can already give Claude access to any CLI program through its Bash tool. So why write an MCP server at all?

Yes, you can absolutely just write a CLI program. If you write my-tool.exe --create-ticket "Bug in login" --project PROJ, Claude can call that from the Bash tool. It works. Claude is smart enough to figure out parameters from a --help output or your instructions. This is the quickest path to "Claude can use my tool."

CLI program (Bash tool)MCP server
Setup timeMinutes — write the program, Claude uses itHours — implement the protocol, write tool definitions
How Claude calls itBy constructing a command-line stringBy calling a named tool with structured JSON parameters
Parameter handlingClaude guesses flags from --help or your instructions; prone to errorsStrict JSON Schema — Claude always passes correct types
Output parsingClaude reads stdout as text and interprets itStructured JSON response — no interpretation needed
CredentialsMust be in environment variables Claude can see, or hardcodedConfigured in the server — Claude never sees raw credentials
Error handlingClaude reads stderr/exit codes and guesses what went wrongStructured error responses Claude can act on precisely
SecurityClaude can run ANY bash command — including bad ones if confusedOnly exposes defined tools — Claude can't go "off script"
Works across AI toolsYes — any AI can run a CLI commandYes — any MCP-compatible AI can use the server
Best forQuick experiments, one-off tasks, tools you already haveProduction integrations, team tools, credential-safe access, repeated daily use
Start with CLI, graduate to MCP. The right path for most developers: first, let Claude call your tool via Bash. Get it working. Understand the parameters. Then, if you're using it every day or sharing it with a team, invest the hour to wrap it in an MCP server for the structured interface and credential safety. Don't build MCP first for something you've never tested.

The concrete reasons to choose MCP over CLI:

Cross-Platform Compatibility

MCP is an open standard. The same MCP server works across multiple AI CLI tools — this is one of its biggest strengths. Write once, use everywhere:

CLI ToolMCP SupportConfig FormatConfig Location
Claude CodeYes — fullJSON (mcpServers)~/.claude/settings.json
Gemini CLIYesJSON (mcpServers)~/.gemini/settings.json
OpenCodeYesJSON (mcp)~/.config/opencode/opencode.json
Copilot CLIPartial — via extensionsGitHub extension formatGitHub Copilot settings
Codex CLIPartialTOML ([mcp_servers])~/.codex/config.toml

The JSON-RPC protocol is identical across all supporting tools. The only difference is the config file format and location. If you build an MCP server for Claude Code, Gemini CLI can usually use it with only a config change.

Deployment: Where Does Your Server Live?
Resources for Building
Estimated time: 8-10 minutes | 12 scenario-based questions
MCP Server Builder Quiz
Score: 0 / 0
0/0
Questions Correct

Making a Plugin: JIRA Integration Example

Claude Code has its own plugin system — separate from MCP servers. A plugin is a package that gives Claude sessions custom slash commands, skills, and hooks. It contains markdown files (skills, commands), JSON configs (manifests, hook definitions), scripts, and optionally compiled programs written in any language — C#, Go, Python, Rust, whatever you want. It gets installed into your local Claude Code environment and runs whenever Claude needs it.

Think of it like a VS Code extension, but for Claude Code. You install it once and it gives Claude new capabilities.
Think of a plugin like hiring a specialist contractor for your team. Claude is the general contractor, and the plugin is the plumber — Claude doesn't know how to talk to JIRA, but the plugin does. When Claude needs JIRA data, it calls the specialist, who handles the messy details (API tokens, field IDs, transitions) and hands back a clean result.

What Languages Can You Use?

This surprises most people: plugins are not language-restricted. The plugin itself is mostly markdown and JSON. The tools it tells Claude to run can be written in anything.

A plugin has four parts, each written in different "languages":

PartWhat It IsWritten In
Plugin structureManifests, skills, commands, hooks configMarkdown, JSON, YAML frontmatter
Hook scriptsSessionStart, PreToolUse, etc.Bash, PowerShell, or any shell script
Utility codeOptional helpers (e.g., skill discovery)JavaScript/TypeScript (Node.js is already installed with Claude Code)
CLI tools invoked by skillsThe programs Claude actually runsAnything — whatever produces an executable

That last row is the key. When a skill tells Claude to run a command, Claude doesn't care what language that command was built in. It just runs it locally and reads the output. So the tools your plugin invokes can be:

The plugin system isn't language-specific — it's command-specific. You write skills in markdown that tell Claude which commands to run. Those commands can be written in any language that produces an executable. A C# developer can build a .exe that queries JIRA, a Python developer can write a script that calls the GitHub API, a Go developer can compile a binary that talks to Slack. The skill just says "run this command" and reads the output. Claude doesn't know or care what language produced the binary.
Cross-platform matters. Claude Code runs on Windows, macOS, and Linux. If your plugin will be shared with a team, the CLI tools it invokes need to work on all platforms your team uses. Shell scripts are Unix-native (bash) — on Windows they require Git Bash, WSL, or the polyglot wrapper pattern shown earlier. C# with .NET and Go are naturally cross-platform. Python requires a Python installation. Native .exe files only work on Windows.

Wait — How Does Claude Even Know What a Plugin Can Do?

This is the question most people skip, and it's the key to understanding the whole system. There's no compiled interface, no API contract, no type system. Instead:

  1. At session start, the plugin's SessionStart hook fires and injects a special skill — sometimes called a "meta-skill" — into the conversation context. Despite the fancy name, it's just a regular SKILL.md file whose job is to list every other available skill and when to use each one. Think of it as the table of contents for the plugin. (Note: "meta-skill" is not an official Anthropic term — it's plugin-community jargon. Anthropic's docs just call everything a "skill." The community started calling the auto-injected table-of-contents skill a "meta-skill" to distinguish it, but structurally it's identical to any other skill.)
  2. Claude reads that context the same way it reads any system prompt. It now knows: "I have a jira-workflow skill I should invoke when the user mentions a ticket"
  3. When Claude invokes a skill, the skill's markdown content gets loaded and Claude follows the instructions — "use acli.exe to query this ticket, check the assignee, transition the status"

The "contract" is natural language. The plugin teaches Claude what tools exist and how to use them, the same way a README teaches a human developer. There is no schema, no function signature, no compiled binding. Just instructions that Claude interprets at runtime.

Where Does the Code Actually Run?

This is the other common misconception. The plugin code does not run in Anthropic's data center. Here's the actual flow:

  1. Claude's model runs on Anthropic's GPU servers in a data center — this is where token generation happens
  2. Claude Code runs on your local machine — it sends prompts to the API and receives responses
  3. When Claude decides to run a command (like acli.exe jira workitem view PROJ-12345), that command executes on your machine, not in the cloud
  4. The command's output gets sent back to Claude as context for the next response
It's like a phone call with an expert. The expert (Claude) is in another city, but when they say "open your terminal and run this command," you run it on your computer. The expert never touches your machine directly.

JIRA tokens never leave your machine. When the plugin tells Claude to run acli.exe, that command executes locally using credentials stored in your local acli config. The API token goes from your machine directly to Atlassian's servers. Claude's data center never sees it — Claude only sees the JSON output that comes back.

How Plugins Differ from MCP Servers

FeaturePluginMCP Server
ProvidesSlash commands, skills, hooks, agentsTools (function calls)
Installed via/plugin installConfig in settings.json
Runs asPart of Claude Code's processSeparate long-running process
Best forWorkflows, methodologies, multi-step processesExposing APIs as callable tools

The Key Insight: Claude Sessions Don't Talk to APIs Directly

This is the most misunderstood part. When an app launches a Claude session to work on a JIRA ticket, Claude doesn't get raw JIRA REST API access. Instead:

  1. The host app manages your JIRA credentials securely (API tokens stored via keytar, never exposed to the renderer or to Claude)
  2. The host app provides a UI to browse tickets and launches Claude sessions with just a ticket key
  3. The plugin handles all JIRA API calls within the session, using its own credential management

The launch looks like this:

// The app passes just the ticket key to Claude, not credentials
const prompt = `/my-plugin:start-work ${ticketKey}`
const args = ['--session-id', sessionId, prompt]

Claude only ever sees the ticket key. The plugin fetches the actual JIRA data using credentials stored separately.

Anatomy of a Plugin

Here's the structure of a production JIRA workflow plugin that integrates ticket lifecycle management into Claude Code:

my-jira-plugin/
├── .claude-plugin/
│   ├── plugin.json          # Plugin metadata & version
│   └── marketplace.json     # Marketplace registration
├── hooks/
│   ├── hooks.json           # SessionStart hook config
│   ├── run-hook.cmd         # Cross-platform polyglot wrapper
│   └── session-start        # Injects skills into every session
├── lib/
│   └── skills-core.js       # Skill discovery & resolution
├── skills/                  # Reusable skills
│   ├── jira-workflow/SKILL.md
│   ├── meta-skill/SKILL.md
│   ├── brainstorming/SKILL.md
│   └── ...
├── commands/                # User-invoked slash commands
│   ├── start-work.md
│   ├── commit.md
│   ├── pr-check.md
│   └── ...
└── agents/                  # Agent templates for subagent dispatch

Step 1: The Plugin Manifest

Every plugin needs a .claude-plugin/plugin.json that declares its identity:

{
  "name": "my-jira-plugin",
  "description": "Development workflow: Jira lifecycle, code review, standards enforcement",
  "version": "1.0.0",
  "author": {
    "name": "Your Team",
    "email": "you@example.com"
  },
  "repository": "https://github.com/your-org/my-jira-plugin",
  "license": "MIT",
  "keywords": ["jira", "tdd", "code-review", "workflows"]
}

And a .claude-plugin/marketplace.json if you want it discoverable by other users:

{
  "name": "my-jira-plugin",
  "description": "Development workflow plugin with Jira integration",
  "owner": {
    "name": "Your Team",
    "email": "you@example.com"
  },
  "plugins": [
    {
      "name": "my-jira-plugin",
      "description": "Jira lifecycle, code review, standards enforcement",
      "version": "1.0.0",
      "source": "./"
    }
  ]
}

Step 2: The SessionStart Hook

Hooks run commands in response to Claude Code events. The hooks.json file tells Claude Code what to execute:

// hooks/hooks.json
{
  "hooks": {
    "SessionStart": [
      {
        "matcher": "startup|resume|clear|compact",
        "hooks": [
          {
            "type": "command",
            "command": "'${CLAUDE_PLUGIN_ROOT}/hooks/run-hook.cmd' session-start",
            "async": false
          }
        ]
      }
    ]
  }
}

The session-start script reads the meta-skill and injects it as context into every new session:

#!/usr/bin/env bash
# hooks/session-start — inject the meta-skill into every session
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]:-$0}")" && pwd)"
PLUGIN_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"

# Read the meta-skill that tells Claude how to use all other skills
content=$(cat "${PLUGIN_ROOT}/skills/meta-skill/SKILL.md")

# Escape for JSON embedding
escape_for_json() {
    local s="$1"
    s="${s//\\/\\\\}"    # backslashes
    s="${s//\"/\\\"}"    # quotes
    s="${s//$'\n'/\\n}"  # newlines
    printf '%s' "$s"
}

escaped=$(escape_for_json "$content")

# Output JSON that Claude Code picks up as session context
cat <<EOF
{
  "hookSpecificOutput": {
    "hookEventName": "SessionStart",
    "additionalContext": "${escaped}"
  }
}
EOF

This is the magic — every time Claude starts a session, it automatically knows about all the skills available in the plugin.

Step 3: Slash Commands

Commands are markdown files in commands/ that define user-invocable actions. Here's a start-work command that kicks off JIRA integration:

# commands/start-work.md
---
description: "Start work on a Jira ticket. Looks up the ticket, checks assignment,
  transitions to In Progress, fetches acceptance criteria, and kicks off brainstorming."
disable-model-invocation: true
---

Invoke the jira-workflow skill to start work on the specified ticket.

Follow this sequence exactly:
1. Look up the ticket (fetch summary, status, acceptance criteria, assignee, sprint)
2. Check assignee — if assigned to someone else, STOP and ask
3. Assign to me if unassigned
4. Transition to In Progress
5. Set sprint to active sprint
6. Present the ticket summary and acceptance criteria to the user
7. Then invoke the brainstorming skill to begin design

Notice: the command itself is just instructions. It delegates the actual work to a skill.

Step 4: Skills (Where the JIRA Logic Lives)

Skills are markdown files in skills/*/SKILL.md with YAML frontmatter. The jira-workflow skill contains the actual JIRA integration logic:

# skills/jira-workflow/SKILL.md
---
name: jira-workflow
description: Use when starting work on a Jira ticket, transitioning ticket
  status, updating ticket fields, or checking ticket assignment
---

# Jira Workflow

## Tools

**Primary: acli** (Atlassian CLI) — more token efficient
```bash
# View a ticket with specific fields
acli.exe jira workitem view PROJ-12345 \
  --fields "summary,status,description,assignee" --json

# Transition ticket status
acli.exe jira workitem transition -k PROJ-12345 \
  -s "In Progress" --yes

# Assign ticket to yourself
acli.exe jira workitem assign -k PROJ-12345 -a @me --yes
```

**Fallback: Jira MCP** — when acli fails or for complex field updates
```
getJiraIssue -> editJiraIssue -> transitionJiraIssue
```

The skill teaches Claude how to interact with JIRA — which CLI tool to use, what fields to fetch, what transitions exist, and what to do when things fail. Claude reads this skill at runtime and follows the instructions.

Step 5: Skill Discovery

The plugin includes a lib/skills-core.js module that handles finding and loading skills at runtime:

// lib/skills-core.js — finds SKILL.md files and extracts their metadata
import fs from 'fs';
import path from 'path';

function extractFrontmatter(filePath) {
    const content = fs.readFileSync(filePath, 'utf8');
    const lines = content.split('\n');
    let inFrontmatter = false;
    let name = '', description = '';

    for (const line of lines) {
        if (line.trim() === '---') {
            if (inFrontmatter) break;
            inFrontmatter = true;
            continue;
        }
        if (inFrontmatter) {
            const match = line.match(/^(\w+):\s*(.*)$/);
            if (match) {
                if (match[1] === 'name') name = match[2].trim();
                if (match[1] === 'description') description = match[2].trim();
            }
        }
    }
    return { name, description };
}

function findSkillsInDir(dir, sourceType, maxDepth = 3) {
    const skills = [];
    if (!fs.existsSync(dir)) return skills;

    function recurse(currentDir, depth) {
        if (depth > maxDepth) return;
        for (const entry of fs.readdirSync(currentDir, { withFileTypes: true })) {
            if (entry.isDirectory()) {
                const skillFile = path.join(currentDir, entry.name, 'SKILL.md');
                if (fs.existsSync(skillFile)) {
                    const { name, description } = extractFrontmatter(skillFile);
                    skills.push({ name: name || entry.name, description, sourceType });
                }
                recurse(path.join(currentDir, entry.name), depth + 1);
            }
        }
    }
    recurse(dir, 0);
    return skills;
}

Personal skills in ~/.claude/skills/ shadow plugin skills, so teams can override behavior without forking the plugin.

Step 6: Cross-Platform Hook Wrapper

Since hooks execute commands, Windows compatibility requires a polyglot wrapper — a single file that works as both a Windows batch file and a bash script:

: << 'CMDBLOCK'
@echo off
REM Windows batch portion — finds bash and delegates
if "%~1"=="" ( echo run-hook.cmd: missing script name >&2 & exit /b 1 )
set "HOOK_DIR=%~dp0"

REM Try Git for Windows bash
if exist "C:\Program Files\Git\bin\bash.exe" (
    "C:\Program Files\Git\bin\bash.exe" "%HOOK_DIR%%~1" %2 %3 %4 %5
    exit /b %ERRORLEVEL%
)
REM Try bash on PATH
where bash >nul 2>nul
if %ERRORLEVEL% equ 0 ( bash "%HOOK_DIR%%~1" %2 %3 %4 %5 & exit /b %ERRORLEVEL% )
exit /b 0
CMDBLOCK

# Unix portion — run the named script directly
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
exec bash "${SCRIPT_DIR}/$1" "${@:2}"

The trick: Windows cmd.exe sees the : as a label and runs the batch code. Bash sees : as a no-op and skips to the bottom.

The Full Flow

Here's what happens end-to-end when a developer types /start-work PROJ-12345:

  1. SessionStart hook already fired — Claude knows about all available skills
  2. Slash command start-work.md tells Claude to invoke the jira-workflow skill
  3. Skill jira-workflow instructs Claude to:
    • Run acli.exe jira workitem view PROJ-12345 --fields "summary,status,description,assignee" --json
    • Check assignee — stop if assigned to someone else
    • Transition: acli.exe jira workitem transition -k PROJ-12345 -s "In Progress" --yes
    • Set sprint: acli.exe jira workitem edit -k PROJ-12345 --custom "customfield_10007:42" --yes
  4. Claude presents the ticket summary and acceptance criteria to the developer
  5. Next skill (brainstorming) kicks in automatically to design the approach

Credentials never touch Claude's servers. The plugin instructs Claude to run local CLI commands that use locally-stored API tokens.

Installation & Distribution

Users install a plugin with two commands:

# Register the marketplace (one time)
/plugin marketplace add your-org/my-jira-plugin

# Install the plugin
/plugin install my-jira-plugin@my-jira-plugin

The plugin gets downloaded to ~/.claude/plugins/cache/my-jira-plugin/. A host app can manage updates programmatically:

// A host app can keep the plugin updated automatically
await runCommand(['plugin', 'marketplace', 'update', 'my-jira-plugin'])
await runCommand(['plugin', 'update', 'my-jira-plugin@my-jira-plugin'])

Building Your Own JIRA Plugin

To build a plugin that integrates with JIRA (or any external API), follow this pattern:

  1. Create the manifest.claude-plugin/plugin.json with name, version, description
  2. Add a SessionStart hook — inject context so Claude knows about your skills on every session
  3. Write skills as markdown — teach Claude which CLI tools to use and how (e.g., acli for JIRA, gh for GitHub)
  4. Create slash commands — entry points that chain skills together into workflows
  5. Handle credentials locally — use tools like acli with saved configs or keytar; never pass tokens through Claude
  6. Publish to the marketplace — add .claude-plugin/marketplace.json so others can install it

The key design principle: skills are instructions, not code. You're teaching Claude a workflow in natural language, backed by CLI tools that handle the actual API calls. Claude never needs raw REST access — it executes local commands that have their own authentication.

A Plugin Is Not a Hook

This is easy to confuse. A plugin is not a hook. A plugin contains hooks, along with skills, commands, and agents. These are four distinct components that live inside a plugin:

ComponentWhat It IsWhen It Runs
HookA command or program triggered by a Claude Code eventAutomatically, on events like SessionStart, PreToolUse, Stop
SkillA markdown file (SKILL.md) that teaches Claude a workflowWhen Claude or a command invokes it
CommandA markdown file that defines a user-typed slash commandWhen the user types /command-name
AgentA template for spawning a specialized subagentWhen Claude dispatches parallel work
A plugin is like a car. A hook is the ignition — it starts things up automatically. Skills are the engine — they do the actual work. Commands are the steering wheel — the driver controls them directly. Agents are passengers you send on errands. The car is not the ignition, but the car contains the ignition.

When we say "the SessionStart hook fires," we mean: one specific component inside the plugin (the hook) executes a command in response to a Claude Code event. The plugin as a whole is the package that contains that hook, plus all the skills, commands, and agents.

Is a Plugin "Live" or Dormant?

Dormant. Between sessions, the plugin is just files sitting on your local disk at ~/.claude/plugins/cache/. No process is running. Anthropic's servers have zero awareness that your plugin exists. Nothing is consuming memory, CPU, or tokens.

But it wakes up automatically when you start a session. Here's the lifecycle:

  1. You start a Claude Code session — the plugin is dormant files on disk
  2. Claude Code fires the SessionStart event — the plugin's hook runs a script
  3. The hook injects the meta-skill — a markdown document listing all available skills gets added to the conversation context
  4. From this point on, Claude "knows" what the plugin can do — because the meta-skill is part of its context window
  5. When you close the session — the plugin goes back to dormant. Nothing persists in memory.

So the plugin is dormant between sessions, active within sessions, and the transition happens automatically via the SessionStart hook.

Can Claude Use the Plugin Without Being Asked?

Yes — and this surprises most people. Claude can invoke plugin skills on its own, without the user explicitly asking, if the meta-skill instructs it to. Here's a concrete example of how aggressive this can be. A real production meta-skill contains instructions like:

A real production plugin's meta-skill contains instructions like: "If you think there is even a 1% chance a skill might apply, you ABSOLUTELY MUST invoke the skill. This is not negotiable. This is not optional."

With instructions like this in the context, if you say "fix the bug in ticket PROJ-12345," Claude will:

  1. Read its injected context and see it has a jira-workflow skill
  2. Invoke that skill autonomously — without asking you first
  3. Follow the skill's instructions to run acli.exe locally to fetch the ticket

However, slash commands can opt out of this. The disable-model-invocation: true frontmatter flag means only a human can type that command. So /start-work requires you to type it, but the underlying jira-workflow skill that it delegates to can be invoked by Claude freely.

Human-Only vs. Claude-Autonomous: The disable-model-invocation Flag

This is one of the most important design decisions a plugin author makes. Every slash command in a plugin has a choice: can Claude invoke it on its own, or does only a human get to type it?

The disable-model-invocation: true flag in a command's YAML frontmatter means: this command requires a human to type it. Claude cannot decide on its own to run /start-work PROJ-12345. Only you can. But here's the subtlety — that restriction only applies to the command, not the skill it delegates to. The jira-workflow skill that /start-work invokes? Claude can call that skill directly, any time, without asking.

This creates a two-tier permission system:

LayerWho Can InvokeExampleWhy
Command with disable-model-invocation: trueHuman only/start-work PROJ-12345Dangerous entry points: transitions a ticket, assigns it to you, starts a whole workflow
Command without the flagHuman or Claude/commitSafe or routine actions Claude might reasonably initiate
SkillHuman or Claude (always)jira-workflowSkills are the building blocks — Claude needs to call them freely to do its work

Think of it like this: the command is the front door with a lock — only the homeowner (you) has the key. The skill is the toolbox inside the house — once someone is inside (via a command you approved, or via Claude's autonomous judgment based on the meta-skill instructions), they can use any tool freely. This lets plugin authors put guardrails on initiating workflows while still letting Claude work autonomously within them.

Plugins and Token Cost

This is the part nobody talks about. The plugin's "contract" IS token usage. There is no free metadata channel, no side-band, no compiled shortcut. Everything Claude knows about the plugin travels as tokens in the conversation context.

Here's what that means concretely:

WhatWhen It Costs TokensApproximate Size
Meta-skillEvery single message, for the entire session~1,500–2,000 tokens
Individual skill (e.g., jira-workflow)Only when invoked — loaded on demand~500–2,000 tokens each
Slash commandOnly when the user types it~100–300 tokens each
Command output (JIRA JSON, etc.)Once generated, stays in contextVaries widely

The meta-skill is the expensive part because it's always there. Every prompt you send — even "what does this variable mean?" — carries the full meta-skill as context overhead. A plugin with 19 skills needs a bigger meta-skill "menu," which means more baseline token cost on every message.

The full flow, in terms of what consumes tokens:

  1. Session starts → hook fires → meta-skill (~2K tokens) injected into context
  2. Every prompt you send now carries that meta-skill as context overhead
  3. Claude reads it, sees available skills, decides which to invoke
  4. When Claude invokes a skill, that skill's full content loads — more tokens added to context
  5. Claude follows the skill instructions, runs local commands, gets output
  6. Command output (JIRA JSON, git diff, etc.) comes back as yet more context tokens

This is a real design tradeoff: more plugin capabilities = more tokens per message = more cost. A lean plugin with 3 skills has minimal overhead. A comprehensive plugin with 19 skills pays a significant per-message tax just for Claude to know what's available. Plugin authors should think carefully about what goes in the meta-skill versus what gets loaded on demand.

Imagine you're on a phone call with a consultant, and you're paying by the minute. At the start of every call, you read them a menu of everything they could possibly help you with. The longer that menu is, the more you pay — even if you only end up asking about one thing. That menu is the meta-skill. The actual work they do after hearing the menu is the individual skill invocations.
Estimated time: 12-15 minutes | 18 questions
Plugin-Builder Quiz
Score: 0 / 0
0/0
Questions Correct

Confusing Terms

This guide has uncovered a surprising amount of confused terminology, misleading names, and inconsistent definitions in the world of AI coding tools. Some of this is Anthropic's fault. Some is the broader AI industry's fault. Some is just the natural fog that forms when technical people name things for themselves and forget that others have to understand them too.

Here is a complete accounting of everything we found.

The Confusing Term What People Think It Means What It Actually Means Who's Responsible
"MCP Server" A deployed server — something with a port, a network, a cloud deployment A local program launched as a child process. Listens for JSON-RPC on stdin, writes results to stdout. No port. No network. Stops when your session ends. A tool dispatcher, not a server. Anthropic — borrowed "server/client" terminology from web architecture even though the implementation is just two programs talking through pipes on your machine
"Plugin" vs "MCP Server" Interchangeable terms for the same thing Completely different. A Plugin (/plugin install) is a bundle containing slash commands, subagents, hooks, and/or MCP configs. An MCP server is one possible ingredient inside a plugin. Anthropic — two distinct concepts with overlapping informal usage
"API Usage" / "Usage" / "Tokens" Three different things All the same concept — token consumption — described from different angles on the same pricing page. "Usage" alone means two different things: messages in your plan window, OR tokens consumed through the API. Anthropic — their pricing page uses four words for one concept, acknowledged as "one of the most confusing product pages in the industry"
"Extended Context Model" A different, more powerful AI model The same Opus 4.6 or Sonnet 4.6, with a larger reading window enabled (1M instead of 200K tokens). Same brain, more desk space. Not a new model. Anthropic
"Skill" Something that runs — like an Alexa skill or a Windows service A markdown document that sits dormant until you invoke it. Does nothing on its own. Not a running process. Rename it mentally to "instruction document" or "playbook." Anthropic — the word "skill" carries active connotations across other platforms
Memory files (cost) Free persistent storage — a nice bonus feature Mini-CLAUDE.md files. They load automatically every session and cost input tokens on every prompt — just like CLAUDE.md. They are not free. They accumulate silently. Anthropic — naming implies cost-free persistence; the token cost is not prominently disclosed
CLAUDE.md size A config file — bigger = more features, no cost Every token in CLAUDE.md is charged as input tokens on every single message, every session, forever. A 500-line CLAUDE.md might cost 3,000–5,000 tokens per message. Treat it like code — refactor and trim it. Anthropic — not disclosed prominently; discovered by developers monitoring token costs
CLAUDE.md "auto-updates" Claude automatically records what it learns It doesn't. Claude reads CLAUDE.md every session but only writes to it when you explicitly ask. If you don't tell Claude to update it, knowledge from a session is lost forever. Anthropic — the passive loading behavior makes users assume writing is also automatic
"Fork" vs "Agent" Two words for spawning a separate Claude instance Fork = git branch for a session. Copies full conversation history. Becomes a new permanent independent session. Agent = subprocess. Starts fresh with only its task brief. Terminates when done. You keep working. AI industry — both "split off" from current context, but are architecturally opposite
"Plugin" (informal) An MCP server — something that connects Claude to external tools Informally used to mean MCP server by most tutorials and developers. Formally means a bundle that can contain MCP configs plus slash commands, subagents, and hooks. AI industry — informal usage has diverged from the formal product definition
Which "Claude"? One product At least six different surfaces: Claude.ai (website), Claude Desktop Chat tab, Claude Desktop Code tab, Claude Code CLI, Claude in VS Code/JetBrains, Claude in Slack, Claude in CI/CD. The Code tab in Claude Desktop IS Claude Code. They're the same engine. Anthropic — poor surface naming; "Claude Code" appears as both a tab name and a product name
"Claude Code" vs "Claude the model" The same thing — "Claude" Claude Code is the tool (like Visual Studio). Claude Opus/Sonnet/Haiku are the AI models inside it (like the compiler version). You can switch models mid-session. They are independent. Anthropic — "Claude" is used for both the platform and the model family
"Governance features" (Enterprise) A capability upgrade — makes Claude smarter or more powerful Audit logs, SSO, SCIM, RBAC, custom data retention. Organizational compliance plumbing. Makes IT and legal teams happy. Does not make Claude smarter by one IQ point. Enterprise software industry — "governance" is a feature category, but buyers often conflate it with capability
"~/.claude/" notation Incomprehensible Unix gibberish On Windows: C:\Users\YourName\.claude\. The tilde means "your home directory." The dot means "config folder." ~/.claude/ (with tilde) = your personal folder. .claude/ (no tilde) = inside your current repo. Two completely different places. AI industry — Unix conventions applied without translation for Windows users
"Repository" / "repo" A mystical technical concept A folder with version history tracking turned on. That's it. Your local repo is a folder on your hard drive. The GitHub repo is the same folder on GitHub's server. Software industry — the word "repository" carries unnecessary gravitas
"Pull Request" Pulling someone else's changes A request to push your changes for others to review before merging. You're pushing, not pulling. The name is backwards from the user's perspective. GitHub — a naming decision that has confused developers for 15+ years
"IQ" and "Awareness" One dimension — how good the AI is Two completely independent dimensions. IQ = reasoning quality (how well it thinks per token). Awareness = context window size (how much it can hold in mind at once, measured in pages). A genius seeing 267 pages beats a mediocre model seeing 1,333 pages on focused tasks — but loses on large codebase reasoning. AI industry — "model quality" is discussed as a single axis; the context/reasoning split is rarely taught
Skills are "contextually loaded" Claude automatically detects when a skill applies and loads it False. There is no background skill-matching. Skills sit dormant. You are always the trigger — either via slash command or explicit mention. Older docs described "contextual loading" which was misleading. Anthropic — earlier documentation implied automatic skill activation
Skills "deactivate" cleanly Skills stay active for a session until you stop them Skills can be silently dropped by /compact without warning. After compaction, Claude may stop following the skill's instructions with no error message. You won't notice until you see Claude ignoring the skill's rules. Anthropic — undisclosed side effect of compaction; documented in GitHub issues, not official docs
Documentation vs. Memory files Both "store" knowledge for Claude to use Memory files: always loaded, cost tokens, Claude-only. Documentation (README.md etc.): costs zero until read, works with any AI, survives tool switching. Memory files are wired to Claude's brain. Documentation is a spare brain that any AI can borrow. AI industry — the concept of "dormant, platform-agnostic memory" as distinct from "always-on AI memory" is not widely taught
Plan Mode (Shift+Tab) vs. plain prompting Plan Mode is how professionals use Claude; prompting is for beginners For most people, telling Claude "plan first, wait for approval" in a plain prompt produces the same result. Plan Mode's advantage is hard mechanical enforcement — it can't edit files during planning even if it wants to. The capability is identical; the guardrail is different. AI industry — product features are marketed as necessary when prompting achieves the same outcome 95% of the time
Enterprise plan = more Awareness Enterprise gets a larger context window than Max As of March 2026, Max and Enterprise have identical 1M context windows. They equalized when Anthropic shipped 1M context generally. Pre-March 2026, Enterprise had 500K and Max had 200K — articles written then are now outdated. Anthropic — tier benefits changed but old articles persist; the gap no longer exists at published tier level
"Token" — four different meanings, only three related One word, one meaning "Token" is one of the most overloaded words in this space. It has four different meanings depending on context, and only three of them are related: (1) Text unit — ~4 characters (rough estimate), the chunk of text the model reads and writes. (2) Billing unit — what you pay for; input and output tokens are often priced separately because processing them can impose different costs ($3/MTok input, $15/MTok output on Sonnet). (3) Inference / KV-cache position — during generation, each token in the active context contributes to memory use in the model's attention machinery; the raw text is tiny but the model's internal mathematical representation costs roughly 64 KB to 0.5 MB of server GPU memory depending on architecture. (4) Security credential — a string that proves your identity (GitHub Personal Access Token, API key, bearer token) — completely separate from the other three. See the Glossary for full definitions of each. Software industry — "token" is independently overloaded in linguistics, economics, computer security, and AI infrastructure, then all four usages collide in a single AI coding session
"Context window" / "tokens" — what they physically are Something on your laptop, like RAM or disk storage The same idea travels through six names: Characters → Tokens → HTTPS request → KV cache → Context window → Awareness (pages). It starts on your keyboard and ends in GPU VRAM on Anthropic's servers. Your laptop contributes nothing except sending text over the internet. The math: 1 token ≈ ~0.5 MB of GPU VRAM (rough midpoint; empirically measured range ~64 KB–0.5 MB depending on architecture). 200K tokens ≈ ~100 GB. 1M tokens ≈ ~500 GB at the midpoint. Real number varies significantly by model design. AI industry — each name is technically correct at its layer, but nobody explains the journey. "Tokens" sounds like a billing unit. "Context window" sounds like a software setting. Neither hints that you're reserving terabytes of server GPU memory. → Chapter 15 (advanced box)
"Meta-skill" — is it a skill or not? A different kind of thing from a skill A "meta-skill" is just a regular skill (a SKILL.md file with YAML frontmatter, like any other) that happens to have one special job: listing all the other skills and telling Claude when to use each one. It gets auto-injected at session start via a hook. There is nothing structurally different about it — same file format, same directory layout, same invocation mechanism. The "meta-" prefix makes it sound like a separate concept or a higher-level abstraction. It's not. It's the table of contents for the plugin, written as a skill so Claude can read it like any other instruction. Plugin ecosystem — naming a regular thing with a prefix that implies it's a different kind of thing
"Task" — two unrelated meanings One meaning "Task" is used two completely different ways in Claude Code. (1) Todo item — an entry in the built-in todo tracker (the "Task Tool"), something to be done, marked in-progress, or completed. (2) Subagent process — an independent Claude subprocess launched via the Agent/Task tool to do work in parallel. A subagent (meaning 2) can create todo items (meaning 1), which means you can have a task managing tasks. The word "todo" is sometimes used to mean only the first sense, but "task" is used for both with no consistent distinction in the docs. Anthropic — same word chosen for two separate features with no disambiguation
"Skill" vs "Plugin" vs "Plugin skill" — three overlapping terms A clear hierarchy where each term means something distinct A skill is a single SKILL.md instruction document you write and drop in .claude\skills\. A plugin is an installed package (via /plugin install) that can contain skills, commands, hooks, and agents — a whole workflow system. A plugin skill is a skill that came bundled inside a plugin rather than one you wrote yourself. The confusion: skills and plugin skills are identical on disk (same SKILL.md format, same folder structure) — the only difference is where they live. And the word "plugin" in everyday speech means "a small add-on," which is also what a skill is, making the two concepts sound interchangeable when they aren't. A skill is one document. A plugin is a deployable package that may contain many skills. Anthropic — "skill" used for both a standalone concept and a component inside a plugin, with no visual or structural distinction between them

Count: 27 documented confusions. Attribution: ~13 to Anthropic (purple rows), ~7 to the broader AI/software industry, ~4 to evolving product definitions. This list will grow — the AI tooling space moves faster than its documentation.

Find Your Path

Not sure where to start? Pick whoever sounds most like you.

A developer who uses AI daily but feels like you're only scratching the surface
Start with the Practitioner's Guide — the AI code loop, red-green testing, cross-AI review, and how to keep a codebase from collapsing under its own weight. Then read Chapter 6 (Hooks) and Chapter 4 (Skills) to automate your workflow.
A Windows developer who finds the CLI world unintuitive
Start with Chapter 1 (Introduction) — it translates every concept into Visual Studio, MSBuild, and %APPDATA% terms you already know. Then Chapter 3 (The .claude Directory) which decodes Unix path notation into actual Windows paths.
Brand new to Claude Code and want hands-on experience fast
Go directly to First 15 Minute Session. Do the steps in a real project. Come back and read the Manifesto after — it will make far more sense once you've seen it work.
A project manager, HR professional, or UX designer now writing scripts or automating work
Start with Chapter 1 (Introduction) — specifically the "Which Claude are we talking about?" section and the GUI vs CLI comparison table. Then Chapter 2 (CLAUDE.md) so you understand how to give Claude persistent instructions without re-explaining yourself every session.
Someone evaluating Claude Code for a corporate or enterprise team
Read Chapter 13 (Scope and Distribution) to understand what gets checked into git vs what stays personal. Then the corporate IT friction callout in Chapter 1, and Chapter 10 (Plugins / MCP Servers) for the enterprise security considerations around MCP server installation.
Confused about what "Claude Code" even means — you've heard three different things
You're in the right place. Read Chapter 1 (Introduction) first — specifically "Wait, Which Claude Are We Talking About?" It names every surface (claude.ai, Claude Desktop, Claude Code CLI, VS Code integration, Slack, CI/CD) and scopes this guide to the terminal CLI.
Want to test yourself before reading anything
Take the Quick Quiz right now. See what you know and what you don't. Let the failures tell you which chapters to read.
Comparing Claude Code to Codex, Copilot, Gemini, or Ollama
Go to Other AI CLIs for the full comparison table plus plain-English paragraphs on each tool, the Ollama local-model option, and a list of tools this guide didn't cover.
Estimated time: 15 minutes hands-on

Your First 15 Minute Session with Claude Code

Theory later. Start here. Do these steps in order in a real project directory and you'll understand more in 15 minutes than you will from reading the whole Manifesto first.

Step 1

Open a Terminal in Your Project

Navigate to a project you already know. Don't start with a new or empty folder — you'll get more value immediately if Claude has real code to look at.

cd C:\repos\MyProject

If you're in VS Code, open the integrated terminal with Ctrl+` and you're already in the right directory.

Like briefing a new consultant by walking them through an existing project rather than handing them a blank whiteboard. Context first.
Step 2

Start Claude Code

claude

You'll see a prompt. You're now in a session. Claude can already see your project directory. It hasn't read any files yet — but it's ready to.

If this is your first time, it will ask you to authenticate with your Anthropic account. Follow the prompts.

Step 3

Ask for a Codebase Overview

Type this:

Give me a high-level overview of this codebase. What does it do, how is it structured, and what are the main entry points?

Claude will read your project structure and key files, then give you a summary. Even if you built this codebase yourself, the description from an outside perspective is often useful. For someone new to the project, this is gold.

Like asking a new team member to summarize what they've learned about the project in their first week. A fresh read often surfaces things the original author stopped noticing.
Step 4

Ask Where a Feature Lives

Pick something specific in your project. Ask:

Where is the user authentication handled? Show me the relevant files and the flow.

(Substitute your own feature.) Claude will search the codebase, find the relevant code, and trace the flow for you. This is the moment most people realize the CLI is fundamentally different from the chat box — it's actually looking at your code.

Step 5

Make One Safe Change

Ask Claude to make a small, low-risk change. A good first one:

Find any TODO comments in the codebase and list them with file paths and line numbers.

Or ask it to add a comment, rename a variable in one file, or fix a typo. Watch it use the Edit tool to actually modify the file. Then open the file in your editor and see the change.

This moment — watching Claude edit your file directly — is what changes people's understanding of what a CLI AI tool actually is.

If you're nervous about Claude editing files, you can always undo with Ctrl+Z in your editor or revert with git. You're not locked in to anything.
Step 6

Run Your Tests

If your project has tests, ask Claude to run them:

Run the tests and tell me if anything is failing.

Claude will use the Bash tool to run your test command, read the output, and interpret the results. If tests fail, it can diagnose the failures and propose fixes. This is the loop described in the Practitioner's Guide — compressed into one step for your first session.

If you don't have tests yet, ask:

What would a good test suite look like for this project? What's the most important thing to test first?
Step 7

Ask for a Plan Before Making a Real Change

Pick something non-trivial you've been meaning to do in the project. Before Claude touches any files, ask for a plan:

I want to [add feature X / refactor Y / fix bug Z]. Before you start, describe your approach: which files you'll change, in what order, and what risks to watch for.

Or press Shift+Tab to enter Plan Mode, then describe the task. Claude will produce a plan for you to review before anything gets touched.

Reviewing the plan takes 30 seconds and catches misunderstandings before they become edits you have to undo.

Step 8

Learn These Four Commands

Before you end your session, try each of these:

CommandWhat it doesWhen to use it
/helpLists all available slash commands and built-in commandsWhenever you're not sure what's available
/compactCompresses conversation history to free context window spaceLong sessions, before starting a new major task in the same session
/costShows token usage and estimated cost for this sessionWhenever you're curious about spend, or before a long expensive task
Ctrl+C then claude --continueExit and resume the same session laterWhen you need to stop and come back; keeps all context intact

--resume (instead of --continue) shows a list of all previous sessions to choose from. Useful when you have multiple projects or sessions running.

What's Next

After Your First Session

You've done the most important thing: you've seen it work on real code. Now the Manifesto chapters will make sense because you have concrete experience to attach them to.

  • Read Chapter II (Sessions) — now you understand what a session actually is
  • Read Chapter III (CLAUDE.md) — now you know why you'd want persistent instructions
  • Run /init — let Claude generate a CLAUDE.md for your project
  • Try the First Session Quiz — test what you just learned while it's fresh
Estimated time: 5-8 minutes | 10 scenario-based questions
First Session Quiz
Score: 0 / 0
0/0
Questions Correct
Estimated time: 15-20 minutes | 40 questions
Quick Quiz
Score: 0 / 0
0/0
Questions Correct
Estimated time: 30-40 minutes | 55 scenario-based questions
Deep Quiz
Score: 0 / 0
0/0
Questions Correct
Estimated reading time: 30-40 minutes

An AI Practitioner's Guide

This is practical advice for building real software with AI coding tools. Not theory. Not hype. This is what works when you're 30,000 lines deep and still adding features.

I

The AI Code Loop

There is a loop. Once you see it, you can't unsee it, and everything gets easier. Here it is:

  1. Instruct. Tell your AI coding tool what to build. Be specific about constraints: testing methodology, code organization, reuse patterns.
  2. Write Tests. AI writes test cases first. They will fail — they are RED — because there is no code yet to make them pass.
  3. Code. The AI writes code. You watch, guide, and course-correct.
  4. Test. Tests run — either in a test framework, or self-contained in the app itself. Every change set triggers validation. Tests should now be GREEN.
  5. Review. A different AI (not the one that wrote the code) reviews it. Copilot reviews Claude's code. Codex reviews Claude's code. A fresh Claude session reviews the existing session's code. Multiple perspectives catch what a single perspective misses.
  6. Fix. Feed the review findings back into the coding AI. It fixes. Tests run again. Repeat until green.
  7. Document. AI updates the changelog, README, product analysis, and CLAUDE.md with what was learned.
  8. Commit. Version number, git tag, check in. You never lose work.

Then you start the loop again for the next feature.

Plan Mode is the "Instruct" step made formal. Press Shift+Tab before coding and Claude describes its entire approach — files it will touch, functions it will write, decisions it will make — before writing a single line. You approve or redirect before anything changes. This is the loop's first gate, enforced by the tool rather than by willpower.
Plans are only as good as how you execute them. Writing a solid plan is step one. Handing it to an AI with the right execution discipline is step two — and most developers skip step two entirely. A structured execution prompt tells the AI to read the actual code before touching anything, find every caller, map the blast radius, and prove its work. Without it, the AI codes from the plan document instead of the repo, and patches the wrong seam. How to Run a Plan →
Session forking is the "different AI reviews it" step without leaving Claude. Press Ctrl+\ to fork your current session — the new session inherits the full conversation history but runs independently. Point the fork at your code and ask it to review. Two Claude perspectives on the same code, with full context, in seconds.
Like a factory production line with quality checkpoints. Raw materials (your prompt) enter at one end. Each station (code, test, review, fix) adds value and catches defects. Nothing ships until it passes every station. The loop isn't bureaucracy — it's what lets you keep the factory running at speed without producing junk.
Like a content approval workflow. A writer drafts, an editor reviews, legal clears it, the brand team checks it, then it publishes. Each gate catches something the previous one missed. You wouldn't skip legal review just because the editor already approved it. The AI code loop is that approval workflow — each step is a different kind of check, and skipping steps is how things go wrong at scale.
Create a slash command called /start-feature in .claude/commands/start-feature.md. The file contains your full loop as a prompt template: "We are starting a new feature. Follow this sequence: (1) write failing tests first, (2) implement the code, (3) run tests, (4) write to PR.md for review, (5) update CLAUDE.md with any hard-won facts. Never skip a step." Every feature session starts with /start-feature and the loop is loaded automatically — no re-typing, no forgetting.
II

Red-Green Testing

Tell your AI to write the test before writing the fix. This is non-negotiable.

  1. RED: Write a test that catches the bug (or validates the new feature). Run it. It should fail. If it passes, the test is wrong — it doesn't actually detect the problem.
  2. GREEN: Write the production code that makes the test pass.
  3. Verify: Run all tests. The new one passes, and nothing else broke.

Why this matters: AI will happily write a test that passes on buggy code. That test is worse than no test — it gives you false confidence. The red phase proves the test actually detects the problem.

AI will try to fix the test instead of the code. This is the single most common failure mode. When a test fails, AI's instinct is to make the test pass by changing the test's expectations. You must tell it: "The test is correct. The production code has the bug. Fix the production code." Put this rule in your CLAUDE.md. You will need it.
Create a PostToolUse hook that watches for edits to test files. When a test file changes but the corresponding production file does not change in the same operation, the hook prints a warning: "Test file modified without production code change — did AI fix the test instead of the code?" This fires automatically on every edit and catches the failure mode before you even look at the output.

Write Broad Tests, Not Narrow Ones

When you find a bug, don't write a test that checks one specific line. Write a test that scans the entire codebase for the pattern that caused the bug. If a missing null check crashed one function, scan every function for the same missing check. AI makes the same mistake in many places at once — your test should catch all of them.

Claude's built-in Grep tool is the mechanical implementation of this. You don't need to shell out to a script — Claude has a native Grep tool that scans any path with a regex pattern in one call. When you find a bug, tell Claude: "Use the Grep tool to scan the entire codebase for this pattern and list every file that has it." That's the broad sweep, done in seconds, without leaving the session.
III

Code Architecture That Survives AI

AI writes code that works right now. Your job is to ensure it still works after 50 more changes. That requires structure.

Separation of Concerns

Put related code in related directories. Menus go in ui/. Database access goes in data/. Session management goes in session/. This isn't perfectionism — it's how you find things six months from now, and how AI finds things when its context window fills up.

Subdirectory CLAUDE.md files enforce namespace rules automatically. Place a CLAUDE.md inside ui/ that says "this directory contains only rendering and display code — no business logic, no data access." Claude picks it up when working in that directory. The architecture becomes self-documenting and self-enforcing — the rules travel with the code, not just the project root.
Create a PostToolUse hook that checks namespace placement after every file write. A simple script reads the file path and the first function definition — if a function named Show-* or Render-* was written outside ui/, or a function named Get-All*Sessions was written outside session/, the hook warns immediately. Enforcing directory conventions via hook means AI can't quietly put code in the wrong place without you finding out.

Code Reuse (The Central Battle)

AI will write a new variation of a function every time you ask for something similar. You'll end up with three functions that format dates, four that parse JSON, and five that build file paths. This is the #1 maintenance problem with AI-generated code.

Fight it constantly:

  • When you see the same 3 lines in two places, extract a shared function
  • Put shared helpers in a core/ or utils/ namespace
  • Tell your AI explicitly: "Check if a helper already exists before writing new code"
  • Put this rule in CLAUDE.md so it persists across sessions
Like keeping a clean workshop. If you let tools pile up wherever they're last used, you'll buy duplicates because you can't find the one you already have. Organized tools mean you grab the right one every time. AI is the apprentice who buys a new wrench for every job instead of checking the toolbox.
A skill can encode the "check before writing" rule as a mandatory step. Create a SKILL.md called reuse-first that instructs Claude to search for existing helpers before writing any utility function — listing what it found before proceeding. When your meta-skill says to invoke it before implementation work, the check becomes part of the workflow rather than advice Claude can quietly skip.
Like a shared drive that no one organizes. After six months you have seventeen versions of the proposal template because everyone made their own copy instead of finding the existing one. AI has exactly this problem — it will create a new "summarize meeting notes" function every time you ask, rather than reusing the one it wrote last week. Your job is to be the person who says "we already have a template for that."
You can use a Claude Code hook to catch duplication automatically. Create a PostToolUse hook that runs after every file edit, triggering a lightweight duplicate-detection script on the changed file. For example, a hook that runs grep -rn looking for function signatures similar to what was just written, and warns you if a near-duplicate already exists elsewhere in the codebase. This turns "fight it constantly" into "get alerted automatically." Put the rule in CLAUDE.md ("never write a new helper without checking for existing ones"), and back it up with a hook that actually enforces it. Belt and suspenders.

Registry-Driven Design (Avoid If-Else Chains)

When your code needs to handle multiple similar things differently (platforms, providers, formats), don't write:

if (platform == "claude") { doX() }
else if (platform == "gemini") { doY() }
else if (platform == "codex") { doZ() }

Instead, create a registry — a data structure that maps each variant to its behavior:

registry = {
  claude: { handler: doX, color: "blue" },
  gemini: { handler: doY, color: "yellow" },
  codex:  { handler: doZ, color: "magenta" }
}
// Then: registry[platform].handler()

Adding a new platform means adding one registry entry — not hunting through every if-else chain in the codebase. This pattern is worth learning. AI will default to if-else chains every time unless you tell it not to.

Create a slash command called /check-patterns that instructs Claude to scan the codebase for if-else chains keyed on known variant strings (e.g., platform names, file type strings, provider names). The command prompt can be as simple as: "Search the entire codebase for if/elseif blocks that branch on string comparisons for known variants. List every occurrence and suggest the registry pattern as a replacement." Run it periodically — especially after a heavy coding session — to catch drift before it compounds.

On SOLID and Design Principles

SOLID principles (Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion) are useful guidelines, not commandments. Apply them where they reduce complexity:

  • Single Responsibility — always useful. One file, one job. One function, one purpose.
  • Open/Closed — the registry pattern above is this in practice. Extend by adding data, not modifying existing code.
  • Liskov / Interface Segregation / DI — valuable in typed languages with class hierarchies (C#, Java, TypeScript). Overkill in scripting languages (PowerShell, Python, Bash). Don't force interfaces and dependency injection into a 500-line script.

The practical test: "Would this pattern prevent a real bug or make a real change easier?" If yes, use it. If it just adds abstraction for abstraction's sake, skip it.

IV

Cross-AI Code Review

Never let the AI that wrote the code be the only one that reviews it. This is like letting the student grade their own exam.

The Agent tool with isolation: "worktree" automates this step. Dispatch a subagent to review the changed files — it gets an isolated copy of the repo, reads the code fresh with no memory of writing it, and reports findings back. No manual copy-paste to a second tool. No re-explaining context. A genuinely independent perspective, triggered with one tool call.

The Process

  1. Claude Code writes the code
  2. Open a different AI (Copilot, Codex, a fresh Claude session, Gemini) and point it at the code
  3. Ask it to review for: bugs, security issues, missed edge cases, code duplication, naming problems
  4. The reviewer writes findings to a file (PR.md)
  5. Feed PR.md back to the original Claude Code session
  6. Claude fixes the issues using red-green methodology

Two reviewers are better than one. Different AI models catch different things — GPT models notice different patterns than Claude models.

This sounds like a lot of ceremony. It isn't. The review step takes 2-3 minutes. It catches bugs that would take hours to debug later. The cost-benefit ratio is absurdly good.
Create a slash command called /peer-review. The command instructs Claude to: (1) summarize all changes made in this session, (2) write a structured review request to PR.md including changed files, what changed, and what to look for, (3) remind you to open PR.md in a separate AI session for review. When you return with the reviewer's findings, run /fix-pr which reads PR.md and instructs Claude to address each finding using red-green methodology. The whole handoff is encoded in two slash commands.
V

Testing at Scale

As your codebase grows, your test count should grow with it. Hundreds of tests are normal. Thousands are not unusual for a mature project.

Where Tests Live

Wherever makes sense for your project:

  • In a test framework (Jest, pytest, Pester) — if one exists for your language
  • In the app itself — if no framework exists, build validation into your code. A "run tests" menu option that executes all checks and reports PASS/FAIL is perfectly valid
  • As static analysis — tests that scan your source code for known anti-patterns (hardcoded values, missing error handling, duplicate logic)

The output format matters: PASS, FAIL, WARN with explanations. This output can be copied and fed directly back to AI as instructions for what to fix.

Pattern-Level Testing

When you find a bug, ask: "Is this a one-off mistake, or a pattern?" If AI wrote a function with a missing null check, it probably wrote ten functions with missing null checks. Write a test that scans all functions for the pattern, not just the one where you found it.

Lint and Static Analysis

Use external lint tools for your language (ESLint, PSScriptAnalyzer, pylint, etc.). Run them on every change. They catch categories of bugs that neither you nor AI will notice during review.

Create a PostToolUse hook that runs your linter automatically after every file edit. For JavaScript: npx eslint $FILE --max-warnings 0. For PowerShell: Invoke-ScriptAnalyzer $FILE. The output appears in the terminal immediately after each edit. Claude sees the lint output and can fix violations in the next turn — without you having to manually invoke the linter or paste results back. The loop tightens from minutes to seconds.
PreToolUse hooks can block a file write before it happens. The real-world card above shows a hook that warns after a write. A PreToolUse hook is stronger — it runs before Claude writes the file and can return a non-zero exit code to abort the operation entirely. The file never gets written until it passes the check. For critical constraints (no credentials in source, no code outside its designated namespace), PreToolUse is the hard stop; PostToolUse is the warning.
VI

Documentation as a Development Tool

AI-generated documentation is not busywork. It's a development tool.

The Files That Matter

  • CLAUDE.md (or your AI's equivalent) — Hard-won facts. Rules. Anti-patterns to avoid. Architecture decisions. This file is read by AI at the start of every session. It is your persistent memory across conversations.
  • CHANGELOG.md — What changed per version. Lets you (and AI) understand the evolution of the codebase.
  • README.md — What the project is, how to build it, how to run it. Useful for onboarding new AI sessions, too.
  • Code comments — Not obvious-what comments ("increment counter"). Hard-won-fact comments ("Do NOT use KeyAvailable in a processing loop — it blocks on spurious events"). These survive across sessions because they're in the code.
Code comments can also be a liability. Stale comments — ones that described what the code did before the last refactor — mislead your AI exactly as much as they mislead humans. AI reads comments as instructions. If a file header says a function lives here and it was moved three versions ago, AI will generate code in the wrong place. There is a right way to write comments so they stay true across refactors. Code Commenting Practices to Reduce Drift →

CLAUDE.md as Institutional Memory

Every painful debugging session should end with an update to CLAUDE.md. If you spent 2 hours figuring out that a particular API silently returns null on Sundays, write it down. AI will encounter the same trap in a future session. CLAUDE.md prevents your AI from repeating your mistakes.

Like a pilot's checklist built from crash investigations. Every rule in the checklist exists because someone got hurt. CLAUDE.md is your project's crash investigation record. It doesn't just tell AI what to do — it tells AI what NOT to do, and why.
Like a "lessons learned" document at the end of a project. Every PM knows these should be written but nobody reads them. CLAUDE.md is the version that actually gets read — because AI reads it before every session. If your last project burned three days because of a miscommunication about scope, write that rule down. The next AI session won't repeat the mistake.
Don't start from scratch. The Case Studies section includes a complete 13-rule Day-1 architectural template — derived from a real production project — with a wizard that generates a customized CLAUDE.md or AGENTS.md for your project. Fill in your product name, variant dimension, key entity, and runtime, and download a ready-to-use file. The Day-1 Prompt →

Let AI Document Your Work

After completing a feature, tell AI to update the changelog and product analysis. Reading AI's description of what you just built is surprisingly useful — it gives you an outsider's perspective on your own creation. Inconsistencies and gaps become visible when someone else describes your work back to you.

Create a Stop hook that fires when Claude finishes a response. The hook prints: "Session ending — did we learn anything hard-won that should be added to CLAUDE.md or memory?" It's a 2-second reminder that costs nothing and prevents the most common knowledge loss: the debugging insight that gets forgotten because you moved on to the next problem. Pair it with a slash command called /update-docs that instructs Claude to update CHANGELOG.md, README.md, and CLAUDE.md with a summary of what changed in this session.
VII

Source Control Is Your Safety Net

Commit early, commit often, commit with version numbers.

  • Small commits — easier to test, easier to revert, easier to understand in the git log
  • Version tags — every significant milestone gets a version (v1.0, v2.0). You can always go back.
  • Experiment freely — with source control, you can try bold refactors. If they fail, revert. Without source control, you're walking a tightrope with no net.
  • Branch for experiments — try something risky on a branch. If it works, merge. If not, delete the branch. No harm done.
If you're not using source control, start today. Git is free. GitHub is free. The learning curve is real but short. The alternative is losing code to a bad AI edit that you can't undo.
Use worktrees (isolation: "worktree" on the Agent tool) when asking Claude to attempt a risky refactor or experiment. The agent gets its own isolated copy of the repo, does all its work there, and reports back. If it succeeded, you get the branch name to review and merge. If it failed, the worktree is discarded — your working tree was never touched. This is the programmatic equivalent of "try the experiment on a branch." You get full reversal with zero manual branching.
VIII

Managing AI Behavior

AI is a powerful but opinionated collaborator. Here are the behaviors you'll need to manage:

AI Adds Unnecessary Complexity

AI over-engineers by default. It adds error handling for impossible cases, creates abstractions for one-time operations, and builds configurable systems when you asked for a simple function. Tell it explicitly: "Keep it simple. Don't add features I didn't ask for. Don't add error handling for scenarios that can't happen."

AI Duplicates Instead of Reusing

Already discussed, but worth repeating: AI writes new code when existing helpers would do. Put "check for existing helpers before writing new code" in every instruction file.

AI Fixes Tests Instead of Code

When a test fails, AI's first instinct is to change the test expectations to match the (buggy) output. "Never fix a test to make it pass. Fix the production code." Put this in CLAUDE.md in bold.

AI Forgets Across Sessions

Every new session starts fresh. AI doesn't remember your architecture decisions, naming conventions, or past mistakes unless you write them down. CLAUDE.md, memory files, and code comments are how knowledge persists. Without them, you'll re-explain the same things session after session.

Auto memory can capture hard-won facts without you having to ask. Claude Code has an auto memory feature — it writes memory files about things it discovers during the session, automatically, without prompting. If Claude figures out a subtle quirk of your API or why a particular function can't be refactored, it can write that to memory before you think to ask. Pair this with CLAUDE.md rules (what to do) and you cover both directions: the rules you write in, and the knowledge Claude writes out.
Use memory files (~/.claude/projects/*/memory/) for knowledge that doesn't belong in CLAUDE.md but needs to persist. CLAUDE.md is for rules and constraints. Memory files are for richer context: "The authentication module was rewritten in v8.2 because the old session token format didn't comply with the new security policy. The old format is still present in legacy data — do not assume all tokens are the new format." Create a slash command called /remember that instructs Claude to write a new memory file capturing what was just learned. Over time, these files become a project knowledge base that survives indefinitely across sessions.

AI Makes Consistent Mistakes

When AI makes a mistake in one place, it has probably made the same mistake everywhere. Don't just fix the instance — search the entire codebase for the pattern. Write a test that catches the pattern globally. This is the single most effective quality practice for AI-generated code.

Memory files are where you record the pattern so it never happens again. After finding a systematic mistake, write a memory file: what the mistake was, what triggered it, what the correct pattern is, and which files were affected. Future sessions load that memory and know the trap in advance. The mistake becomes institutional knowledge instead of a recurring surprise.
CLAUDE.md is where you turn the pattern into a rule. Memory files explain context. CLAUDE.md enforces behavior. After a systematic bug, add a rule: "Never use X approach — it causes Y. Always use Z instead." Claude reads CLAUDE.md at the start of every session, so the rule is active before the first line of code is written. The pattern can't recur if the rule is loaded before work begins.
A PostToolUse hook can catch a recurring error class automatically. If a systematic mistake has a detectable signature — a function call, a missing check, a specific code pattern — write a hook that greps for it after every file edit. When the hook fires, it prints the warning immediately in the terminal, before the next prompt. Claude sees the output and can address it in context. The catch goes from "found during review hours later" to "caught in the same turn it was written."
IX

A Note on the Future

By 2027, you probably won't need most of these practices. AI tools will get better at maintaining context, avoiding duplication, and testing their own output. The tools will internalize the discipline that we currently impose manually.

But it's still 2026. And right now, the difference between a developer who follows these practices and one who doesn't is the difference between a project that grows gracefully and one that collapses under its own weight at version 5.

The loop works. Use it.

This guide was NOT followed in the making of this website! It didn't warrant it. This guide is for serious development, not for throwing together a static webpage with a little JavaScript.
Estimated time: 10-15 minutes | 20 questions
Bonus Features Quiz
Score: 0 / 0
0/0
Questions Correct

Remote Control from Your Phone

Shipped February 25, 2026. Start a task on your laptop, put it in your bag, and keep full control from your phone. The session runs locally on your machine — your filesystem, MCP servers, tools, and project config all stay available. You just control it remotely.

Like Remote Desktop, but for a single CLI session. Your machine does all the work. Your phone is just a window into it.
Like a shared Google Doc that you started on your laptop and can continue editing from your phone in the back of a cab. The document lives in the cloud (or in this case, on your machine). Your phone is just a second screen. Everything stays in sync automatically.

Requirements

Method 1: From an Existing Session (/rc)

You're already in a Claude Code session and want to hand it off to your phone:

  1. Optional but recommended: type /rename MyTask so you can find the session by name on your phone
  2. Type /rc (short for /remote-control)
  3. A session URL appears in your terminal
  4. Press spacebar to toggle a QR code display
  5. Scan the QR code with your phone — it opens directly in the Claude app or browser
  6. You're now controlling the session from your phone. The conversation stays in sync.

Method 2: Start a New Remote Session

Start a session that's remote-ready from the beginning:

  1. cd to your project directory
  2. Run claude remote-control
  3. The terminal shows a session URL and QR code (spacebar to toggle)
  4. Scan or open the URL on your phone
  5. Start prompting from your phone immediately

Three Ways to Connect from Your Phone

MethodHowBest For
QR Code Scan the QR code shown in terminal (press spacebar to show it) Quickest — phone camera to session in 2 seconds
Session URL Copy the URL from terminal, open in any browser When you're connecting from a tablet or another computer
Session List Open claude.ai/code or the Claude app, find the session by name When you named the session with /rename and want to find it later

Remote Control sessions show a computer icon with a green dot in the session list when online.

Enable for Every Session (Auto Mode)

If you always want Remote Control available:

  1. Type /config inside any Claude Code session
  2. Set "Enable Remote Control for all sessions" to true
  3. Every future session will automatically be available for remote connection

What Happens Under the Hood

Constraints and Gotchas

The killer use case: kick off a long refactoring task, walk to a meeting, and approve file edits from your phone as Claude works through them. You don't lose 30 minutes of AI productivity just because you had to leave your desk.

Practical Workflow

  1. At your desk: start a Claude Code session, give it a complex task
  2. Type /rc, scan the QR code with your phone
  3. Close your laptop lid (sleep is fine)
  4. From your phone: watch Claude work, approve tool uses, answer questions
  5. Back at your desk: open your laptop, the terminal is still running, continue in the terminal

Other AI Coding CLIs


Snapshot comparison as of March 2026. This space moves fast. Feature parity between tools changes with every release. Re-check vendor docs before relying on fine details. The table below favors concept-level comparisons over brittle feature minutiae — if a specific capability matters to your decision, verify it against the current docs for that tool.

Claude Code isn't the only AI coding CLI. Here's how the major players compare on the features discussed in this guide. All of these are terminal-based coding assistants — not IDE plugins, not chat UIs.

Feature Claude Code Codex CLI Copilot CLI OpenCode Gemini CLI
Vendor Anthropic OpenAI GitHub / Microsoft Community (OSS) Google
Default Model(s) Claude Opus, Sonnet, Haiku GPT-5.x Codex variants Claude Haiku, GPT-4.1, GPT-5 Mini Any (Claude, GPT, etc.) Gemini 2.5 Pro/Flash, 3 Pro
Session Persistence Yes — .jsonl files Yes — SQLite Yes — events.jsonl Yes — SQLite Yes — JSON files
Resume Sessions Yes --resume Yes resume Yes --resume Yes --session Yes --resume
Fork Sessions Yes --fork-session Yes fork No Yes --fork No
Project Instructions (CLAUDE.md equivalent) CLAUDE.md AGENTS.md copilot-instructions.md AGENTS.md GEMINI.md
Skills / Instruction Files Yes — SKILL.md in .claude/ No No No No
Slash Commands Yes — .claude/commands/ No No No No
Hooks (Event Triggers) Yes — pre/post tool use, session start No No No No
MCP Servers (Plugins) Yes — full MCP support No Limited — extensions via GitHub No Yes — MCP support
Agents / Subagents Yes — spawns child Claude instances No No No No
Memory (Cross-Session) Yes — ~/.claude/projects/*/memory/ No No No No
Plan Mode Yes — structured planning before coding No No No No
File Editing Yes — Read, Edit, Write tools Yes — apply patch model Yes — file operations Yes — file operations Yes — file operations
Shell Command Execution Yes — Bash tool Yes — sandbox mode Yes — shell access Yes — shell access Yes — shell access
Permission Model Per-action approval or bypass mode Sandbox + approval tiers. Creates restricted Windows user accounts for OS-level isolation on Windows. --yolo (allow-all) mode Config-based --approval-mode yolo
Cost Model Per-token (input/output/cache) Per-token blended rate Premium requests (subscription) Per-message (provider-dependent) Free tier / API key billing
Install Method npm i -g @anthropic-ai/claude-code npm i -g @openai/codex npm i -g @github/copilot npm i -g opencode-ai npm i -g @google/gemini-cli
Individual Cost (monthly) Free tier available
Pro: $20/mo
Max 5×: $100/mo
Max 20×: $200/mo
Included in ChatGPT plans
ChatGPT Plus: $20/mo
ChatGPT Pro: $200/mo
(No standalone Codex plan)
Free tier available
Copilot Pro: $10/mo
Copilot Pro+: $39/mo
(Includes IDE + CLI)
Free (open source)
Pay API costs directly
(Bring your own API key)
Free tier available
Google AI Pro: $20/mo
Google AI Ultra: $250/mo
(Covers Gemini CLI + other Google AI)
Value for coding High — richest feature set justifies cost Medium — good if you already pay for ChatGPT Best value — $10/mo for a full coding CLI Cheapest — pay only for tokens used; no subscription Medium — $20/mo bundles many Google AI tools
About Each CLI
  1. Claude Code (Anthropic) is the subject of this entire guide. It has the deepest architecture of any CLI coding tool — skills, hooks, slash commands, agents, memory, plan mode, and Remote Control. If you invest time in learning one CLI deeply, this is the one.

  2. Codex CLI (OpenAI) is OpenAI's answer to Claude Code. It uses the GPT-5 Codex model variants, stores sessions in SQLite, and supports forking. It runs in a sandbox mode with tiered approval for commands. Its main advantage is the GPT-5 Codex model family, which has strengths in certain code generation tasks. Weaker extension architecture than Claude Code.

    Codex uses AGENT.md the same way Claude uses CLAUDE.md. Same concept: a markdown file loaded at session start that gives the AI persistent instructions. If you have a well-crafted CLAUDE.md, copy it to AGENT.md and Codex will pick it up. Your architecture rules, coding standards, and anti-patterns carry over with zero extra work.

    Windows users: Codex creates sandboxed Windows user accounts on your machine as part of its security model. You may notice new user accounts (named something like "Codex Sandbox" or similar) appearing in your Windows user list after installing Codex. This is intentional — those restricted accounts run Codex's operations with limited OS permissions so that commands cannot reach beyond what a low-privilege user can do. Do not delete them; removing them breaks Codex's sandboxing. This is specific to Codex's Windows implementation and is not something Claude Code does.

  3. Copilot CLI (GitHub / Microsoft) is backed by Microsoft's investment in GitHub and OpenAI. It offers multiple model choices (Claude Haiku, GPT-4.1, GPT-5 Mini), which is unusual — most CLIs are locked to their vendor's model. It has session persistence and file editing but no forking, no skills, no hooks. Strong choice if your organization is already deep in the GitHub ecosystem.

  4. OpenCode is open-source and model-agnostic — you can point it at Claude, GPT, or any other supported provider. This makes it attractive for organizations that want control over which AI model processes their code, or that need to run everything on-premises. The trade-off is a thinner feature set: no skills, no hooks, no memory. A good choice for teams with strict data residency requirements.

  5. Gemini CLI (Google) brings Google's Gemini models to the command line. Gemini 2.5 Pro has an exceptionally large context window, which helps with large codebases. It supports MCP servers and has session persistence. Weaker extension architecture than Claude Code, but the model itself handles large-scale codebase analysis well. Natural choice if your organization is on Google Cloud.

A Different Category: Local / Self-Hosted AI (Ollama and Peers)

All five CLIs above send your code to a cloud API. Ollama, LM Studio, Jan, and similar tools take a different approach: they run AI models locally on your machine, with no data leaving your network. This is a fundamentally different architecture.

Ollama is the most popular. It downloads open-source models (Llama, Mistral, CodeLlama, DeepSeek, and others) and runs them locally via a REST API on localhost:11434. Other tools (including OpenCode) can point at an Ollama endpoint instead of a cloud API, giving you local inference with a familiar CLI experience.

The trade-offs are real:

If your organization has strict data policies that prohibit sending code to any external service, the Ollama + OpenCode combination is worth investigating. For most developers without those constraints, cloud-based CLIs currently offer better model quality and richer tooling.

Other CLIs Not Covered Here

The AI coding tool space is moving fast. This guide covers the five most widely used cloud-based CLIs as of early 2026, but the following tools also exist and may be relevant depending on your stack:

  1. Aider — An open-source Python-based coding assistant with strong git integration. Predates Claude Code and has a loyal following. Works with multiple models.
  2. Continue — Primarily a VS Code/JetBrains extension but has CLI capabilities. Model-agnostic.
  3. Cline — VS Code extension with CLI roots. MCP support, model-agnostic.
  4. Cursor — An IDE (fork of VS Code) with deeply integrated AI. Not technically a CLI but competes for the same workflow.
  5. Amazon Q Developer CLI — AWS's coding assistant. Strong if you live in the AWS ecosystem.
  6. Mistral Le Chat — European-based, GDPR-native AI. CLI tooling is early-stage but growing.

This is not an exhaustive list. New tools appear monthly. The architectural concepts in this guide (sessions, hooks, MCP, memory, skills) are Claude Code-specific, but the underlying ideas — persistent context, tool use, cross-session memory — are becoming table stakes across the industry.

Which Should I Use?

The honest answer depends on your situation, not on which tool is theoretically best:

Your situationUse thisWhy
Individual developer, no corporate constraintsClaude CodeDeepest architecture, best long-term investment
Your org is deep in GitHub / Azure / MicrosoftClaude CodeStill the best coding tool — ecosystem fit is a secondary concern
Your org is on Google CloudClaude CodePlatform doesn't change which tool codes best
Code cannot leave your network (compliance, classified)OpenCode + OllamaFully local, no cloud API calls
IT won't approve any cloud AI toolOllama + AiderOpen source, fully self-hosted, no vendor
You want the simplest possible setupClaude CodeOne install, everything works out of the box
Your company already has a standardWhatever they choseConsistency beats optimization. Learn the architecture here, apply it there.
Just ask any of the AI systems which is best for what task — they will tell you.
Claude Code the Tool vs. Claude the Model — They Are Not the Same Thing

This trips people up constantly. Claude Code is the CLI application. The Claude models (Opus, Sonnet, Haiku) are the AI brains that run inside it. They are separate. You can switch models mid-session with /model without restarting anything.

Think of it this way: Claude Code is like Visual Studio. The Claude model is like the C# compiler version. You can change the compiler version without reinstalling Visual Studio.

ModelCoding abilitySpeedCostUse for
Claude Opus 4.6Excellent — deep reasoningSlowHighestComplex architecture, hard bugs, design decisions
Claude Sonnet 4.6Very good — well balancedFastMediumDaily coding work — the recommended default
Claude Haiku 4.5Adequate for simple tasksFastestLowestQuick edits, boilerplate, simple lookups — NOT complex coding
Not all Claude models are equally good for coding. Haiku is fast and cheap but will struggle with complex multi-file refactors or subtle bugs. Sonnet is the right default for coding. Opus when you need the heavy artillery. Using Haiku for serious coding to save money is like buying cheap tires — you'll pay for it later.

The same separation applies to other CLIs: Copilot CLI lets you choose between Claude Haiku, GPT-4.1, and GPT-5 Mini. OpenCode can point at any model via Ollama or an API. The CLI is the tool; the model is the engine. You can swap engines.

AI Has Two Dimensions: IQ and Awareness

Editorial teaching model — not benchmark scores. IQ numbers are estimated shorthand for relative reasoning quality (Opus 4.6 = 100 baseline), not official measurements. Awareness numbers are calculated from published token limits and are factual.   IQ = editorial estimate   Awareness = official fact

People talk about AI quality as if it's one thing. It isn't. There are two independent dimensions:

Think of it like image fidelity. A single photograph can be ultra-high-resolution — every pixel tiny. That's high IQ. A movie covers far more ground with thousands of frames, but each individual frame may be lower resolution than a still. That's high Awareness. A brilliant photograph of one page of your codebase is less useful than a full movie of the entire project, even if each movie frame is slightly softer.

We measure Awareness in pages — one standard 8.5×11 page of dense text ≈ 500 words ≈ 750 tokens. This gives you an intuitive sense of how much the AI can "see" at once. More Awareness is most valuable on large codebases, long sessions, and cross-file reasoning. For small, narrow tasks — renaming a variable, explaining a single function — extra Awareness may matter less than reasoning quality (IQ).

Platform / Plan Model Context Window (tokens) Awareness (pages) Notes
Claude Code — Free / ProOpus, Sonnet, Haiku200K tokens~267 pagesStandard. Pro users can opt into 1M by typing /extra-usage
Claude Code — Max, Team, EnterpriseOpus 4.6, Sonnet 4.61M tokens~1,333 pagesAutomatic 1M context on Max/Team/Enterprise as of March 2026
Claude Code — Enterprise (extended)Sonnet 4.6Up to 1M+~1,333+ pagesEnterprise negotiates custom context. Verify with Anthropic sales.
Codex CLI / ChatGPT PlusGPT-5~272K tokens~363 pagesSolid but well below Claude Max
Copilot CLIClaude Haiku, GPT-4.1, GPT-5 MiniVaries by model selected~170–267 pagesCopilot may cap context below the model's native maximum
Gemini CLI — Free / ProGemini 2.5 Flash, 2.5 Pro1M tokens~1,333 pagesGemini has exceptional context by default — largest in the comparison
Gemini CLI — Ultra (coming)Gemini 2.5 Pro (2M)2M tokens~2,667 pages2M context version in development — would be the most aware tool available
OpenCode + Ollama (local)Llama, Mistral, DeepSeek, etc.4K–128K tokens (varies by model)5–170 pagesLocal models have much smaller context windows. Major limitation vs cloud models.
Awareness matters most on large projects. If your codebase has 50 files averaging 300 lines each, you have roughly 15,000 lines of code. At ~50 lines per page that's 300 pages of code. A 267-page Awareness model (Claude Pro) literally cannot hold your entire codebase in mind at once. A 1,333-page model (Claude Max) can — and will give much better answers about cross-file relationships, because it sees the whole picture.
The "Same Model" Confusion: Why Claude Code Still Wins When Copilot Uses Claude

GitHub Copilot CLI offers Claude Haiku 4.5 as one of its model choices. So if both Claude Code and Copilot can run Claude Haiku — are they the same tool?

No. Same brain, completely different truck.

When Copilot routes to Claude Haiku, you get Haiku's reasoning ability. That part is identical. But Copilot does not give Claude:

The model is the IQ. The platform wrapped around it determines the Awareness and the workflow. A brilliant person handed a blank notepad is less effective than a capable person with a full desk, reference library, filing cabinet, and an organized team of assistants.

Like two contractors with identical professional licenses. One shows up with a fully-equipped truck: every tool, organized, labeled, ready. The other shows up with the license and a hammer. Same credential. Radically different output.

Why Copilot offers Claude Haiku at all: Anthropic licenses the Claude models via API. Microsoft/GitHub pays to route calls through Anthropic's API and bundle it into Copilot. Same model, different packaging, different surrounding toolset, different pricing, different context limits. The model is a commodity; the tooling around it is where the differentiation happens.

Key Takeaway

Claude Code has the richest extension architecture by far — skills, hooks, slash commands, MCP servers, agents, memory, and plan mode are all unique to it. The other CLIs are primarily prompt-and-respond tools with session persistence. If you're investing time learning AI coding tool architecture, Claude Code is where that investment pays off the most.

That said, all five CLIs can edit files, run commands, and resume sessions. For basic coding tasks, the experience is similar across all of them. The differences emerge when you want automation (hooks), extensibility (MCP), reusable workflows (skills/commands), and cross-session knowledge (memory).

Glossary

Every tool, acronym, and term used in this guide — defined in one place. Anthropic’s confusing naming is called out inline where it applies.

TermDefinition
AiderAn open-source, Python-based AI coding assistant that runs in the terminal. Works with multiple AI models. Predates Claude Code and has a loyal following among developers who prefer open-source tools.→ Other AI CLIs
acli / acli.exeThe Atlassian CLI — a command-line tool for interacting with JIRA and other Atlassian products. More token-efficient than calling the JIRA REST API directly. Claude Code plugins use it to look up tickets, transition statuses, assign issues, and update custom fields. The .exe version is the Windows executable. Configured once with your server URL, email, and API token via acli.exe jira --server ... --save-config. → Building a Plugin
Agent / SubagentA separate Claude instance spawned to handle a subtask. Runs in its own context, reports results back. Like a child process.
AllowlistA list of pre-approved tools/commands in settings.json. Claude won't ask permission for allowlisted actions.→ Chapter 15.5
APIApplication Programming Interface. A defined way for programs to talk to each other. Think of it as a drive-through window: you send a request in a standard format, the system processes it, and sends back a response. Claude's API lets software applications send prompts to Claude and receive responses — programmatically, without a human typing.→ Pricing page
API UsageTokens consumed — described from the billing side. Anthropic says “API usage billed separately” to mean you pay per token on top of any seat fee. Not a separate concept from tokens — it IS tokens, counted and charged.→ Pricing page
aptPackage manager for Debian/Ubuntu Linux. apt install nodejs installs Node.js.
Architecture Decision Record (ADR)A short document that records a significant architectural decision: what was decided, why, what alternatives were considered, and what the consequences are. Written before the work starts, not after. The software equivalent of writing down why you made a structural choice before pouring the concrete. Plan Mode in Claude Code is the same discipline — describe your approach before touching files.→ Chapter IX (Plans)
Audit logA record of who did what and when. In enterprise Claude, every prompt and response is logged. Legal and compliance teams use audit logs to investigate incidents or demonstrate regulatory compliance.→ Pricing page
Auto memoryA Claude Code feature where Claude automatically writes memory notes about things it discovers, without being asked. These notes persist across sessions.→ Chapter X
Autoregressive generationHow language models generate output: one token at a time, each token depending on all previous tokens. Cannot be parallelized. This is why output tokens cost 5x more than input tokens — each requires a full forward pass through the entire model.
Awareness (AI)How much information an AI can hold in mind at once. Measured in pages (1 page ≈ 500 words ≈ 750 tokens). A 200K token model has ~267 pages of Awareness. A 1M token model has ~1,333 pages. Distinct from IQ — a smart model with low Awareness misses things it simply cannot see.→ IQ/Awareness chart
bashA command-line shell (the program that interprets your typed commands). Default on Mac/Linux. Available on Windows via WSL or Git Bash. Claude Code's Bash tool runs commands through this.
BOMByte Order Mark. An invisible character at the start of a file that declares its encoding. Some tools (like Gemini CLI) reject files with BOMs.
BoilerplateRepetitive, standard code that's needed but not interesting. Templates, imports, config setup. Good candidate for Haiku.
BranchA parallel version of your code where you can try changes without affecting the main version. Like making a copy of a document to draft edits, then deciding whether to keep or discard the changes. main is the primary branch considered the “official” version.
brew (Homebrew)Package manager for macOS. brew install node installs Node.js. Mac's equivalent of apt/chocolatey.
Cache (Token Cache)Claude caches repeated content (CLAUDE.md, skills) after first use. Subsequent reads cost 10x less.
catOutputs file contents to the terminal. cat readme.md prints the file.
cdChange Directory. Navigates to a folder. cd C:\repos\MyProject moves you into that directory.
CDNContent Delivery Network. Servers that cache content close to users for speed. Analogous to how Claude's token cache reduces repeated input costs.
Certificate Authority (CA)An organization that issues digital certificates proving a server is who it claims to be. Corporate networks sometimes use their own internal CA, which can cause CLI tools to reject HTTPS connections until the custom certificate is installed.
CI/CDContinuous Integration / Continuous Deployment. An automated pipeline that builds, tests, and deploys code every time someone pushes a change. “CI/CD pipeline” = the assembly line that takes your code from commit to production automatically.→ Other AI CLIs
CLAUDE.mdA markdown file of instructions that Claude reads at the start of every session. Your project's configuration for AI behavior.→ Chapter III
CLICommand Line Interface. A text-based way to interact with a program. You type commands, it responds with text.→ Introduction
CloneDownloading a copy of a git repository to your local machine. git clone https://github.com/user/repo creates a local copy you can work with.
CodebaseAll the source code files that make up a project or application. “The codebase” = the entire collection of code, not just one file. When Claude Code reads your codebase, it's scanning your project's files.→ Introduction
CommitA saved snapshot of your code at a specific point in time. Every commit has a unique ID and a message describing what changed. Like a save point in a game — you can always go back to any previous commit.
/compactA Claude Code command. Manually compresses conversation history to free context window space. Lets you control when and what gets compressed.→ Chapter 15.4
--continueCLI flag to resume the most recent session in the current directory.
Context WindowThe total token capacity Claude can hold at once. Lives in GPU VRAM on Anthropic's servers (the KV cache), not your laptop. Median cost: ~0.5 MB per token. 200K tokens (Pro) ≈ 100 GB of server VRAM. 1M tokens (Max/Enterprise) ≈ 500 GB. Bigger context windows are real hardware costs, not just software features. Anthropic also calls this “extended context model” when the 1M window is enabled — it's the same model, just more Awareness turned on.→ Chapter 15 (advanced box)
Corporate proxyA network intermediary in enterprise environments that routes and inspects outbound internet traffic. Can require special configuration for CLI tools that make HTTPS requests.→ Pricing page
/costA Claude Code command. Shows token usage and estimated dollar cost for the current session.→ Chapter 15.8
curlTransfers data from/to URLs via command line. Used to test APIs: curl https://api.example.com/users.
CursorAn AI-integrated code editor built as a fork of VS Code. Has AI coding assistance built directly into the editor rather than as a CLI tool. Competes with Claude Code for the same “AI-assisted coding” workflow.→ Other AI CLIs
DependencyA package/library your project requires. Managed by npm (JavaScript), pip (Python), NuGet (.NET).
DIDependency Injection. A design pattern where objects receive their dependencies rather than creating them. Part of SOLID. Overkill in scripts.
DLP (Data Loss Prevention)Software that monitors and blocks sensitive data from leaving a corporate network. Can interfere with CLI tools that make outbound HTTPS requests, even legitimate ones like Claude Code.→ Pricing page
.editorconfigA configuration file that tells code editors how to format code: tab size, line endings, indentation style. Works across different editors so the whole team uses the same formatting rules automatically.
Environment VariableA named value set in the operating system, accessible by programs. Used for credentials: DATABASE_URL=postgres://...
Extended context modelAnthropic's term for the same Opus or Sonnet model with the 1M context window enabled instead of 200K. Not a different AI. Not a new model. Just more Awareness turned on. Plain term: 1M Awareness (available on Max/Team/Enterprise).
Extended ThinkingInternal reasoning Claude does before responding. Generates invisible “thinking” tokens. Costs more but produces better answers for complex problems.→ Chapter 15.10
FrontmatterA metadata block at the top of a markdown file, delimited by --- lines. Contains key-value pairs like name: and description:. Used in Claude Code skills and slash commands so the system can discover and describe them. Written in YAML format. → Making a Plugin
Forward passOne complete computation through all layers of a neural network. Each output token requires one full forward pass. For a model with 80+ layers, this is expensive — hence output tokens cost 5x input tokens.
gitThe version control system that tracks changes to files over time. It lets you save snapshots (commits), try experiments on separate branches, and go back to any previous state. git itself is a command-line tool. Products that provide git hosting and collaboration: GitHub (most popular, owned by Microsoft), GitLab (popular in enterprises, can be self-hosted), Bitbucket (popular with Atlassian/Jira shops), Azure DevOps (Microsoft's enterprise git platform), and Gitea/Forgejo (self-hosted open source options). GitHub is the public face of git for most developers.→ Glossary
GitHubA website (github.com) where developers store and share git repositories. Think of it as Google Drive for code, with version history and collaboration tools built in. Free for public projects. Owned by Microsoft.→ Other AI CLIs
.gitignoreA text file in a git repository that lists files and folders git should ignore — never track, never commit. Put credentials, build output, and personal settings here. Like a “do not touch” list for git.
globGlobal (pattern). A file-matching pattern. *.js matches all JavaScript files. src/**/*.ts matches all TypeScript files in src/ recursively.
Governance (features)Controls that let an organization manage, monitor, and restrict how employees use a tool. In Claude Enterprise: audit logs (who sent what), SSO (employees log in with company accounts), RBAC (which employees can use which features), SCIM (auto-add/remove users when they join or leave). None of it makes Claude smarter — it makes IT and legal teams happy.→ Pricing page
GQA (Grouped Query Attention)A modern AI architecture technique where the model uses far fewer KV heads than total attention heads. Instead of one Key-Value pair per attention head, multiple heads share a single KV head. A model with 64 total heads might have only 8 KV heads — reducing KV cache size by 8x. The main reason modern models are far more memory-efficient than older ones. Claude almost certainly uses GQA.
grepSearches for text patterns in files. grep "function" *.js finds all lines containing “function” in JavaScript files. Claude Code has a built-in Grep tool.
Group PolicyA Windows feature in corporate environments that lets IT centrally control what employees can install and run on their computers. If Group Policy is restricting your machine, you may need IT to whitelist Claude Code before you can install it.→ Pricing page
gRPCGoogle Remote Procedure Call. A high-performance RPC protocol using Protocol Buffers. MCP does NOT use gRPC — it uses JSON-RPC.
GUIGraphical User Interface. A visual interface with windows, buttons, and mouse interaction. What most people think of as “an application.”→ Introduction
Haiku (Claude Haiku)Claude's fastest and cheapest model. Fine for simple edits, quick lookups, and boilerplate. Not recommended for complex coding — it will struggle and produce lower-quality results.→ IQ/Awareness chart
HBM (High Bandwidth Memory)The memory type used in AI GPUs (H100, H200). Stacked directly on the GPU for maximum bandwidth (~3.35 TB/s on H100). Essential because the KV cache must be read on every forward pass — bandwidth is the bottleneck, not compute.
HeadlessRunning software without any visual interface — no windows, no mouse, no screen. Servers run headless. Claude Code can run headless in automated pipelines, executing tasks without a human watching.→ Other AI CLIs
/helpA Claude Code command. Lists all available commands, built-in tools, and current session information.→ Chapter XV
HIPAAHealth Insurance Portability and Accountability Act. US federal law regulating how health data must be protected. If your organization handles patient data, you need HIPAA-compliant tools — meaning the vendor must sign a Business Associate Agreement and meet specific security requirements.→ Pricing page
HookA command or program that executes automatically in response to a Claude Code event (before/after tool use, session start, session stop). Can be any executable — a bash script, a PowerShell script, a Python script, or a compiled binary. Unlike skills (which are instructions Claude reads), hooks actually run code.→ Chapter VII
HTTP / HTTPSHyperText Transfer Protocol (Secure). The protocol web browsers use. HTTPS adds encryption (TLS). APIs, web pages, and remote MCP servers use HTTPS.
IDEIntegrated Development Environment. A code editor with built-in tools (debugger, compiler, etc.). Visual Studio, VS Code, JetBrains products.→ Introduction
/initA Claude Code command. Analyzes your project and generates an initial CLAUDE.md automatically. Run it when starting a new project, or to improve an existing CLAUDE.md.→ Chapter III
IQ (AI)The AI model's reasoning capability — how well it thinks, writes code, solves hard problems. Opus has higher IQ than Haiku. Distinct from Awareness (context window size). Both matter; they measure different things.→ IQ/Awareness chart
JetBrainsA software company that makes professional IDEs: IntelliJ IDEA (Java), Rider (.NET/C#), PyCharm (Python), WebStorm (JavaScript), and others. Known for smart code completion and refactoring tools.→ Other AI CLIs
JIRAA project management and issue tracking tool by Atlassian, widely used in software teams. Developers track bugs, stories, and tasks as "tickets" (e.g., PROJ-12345). Has a REST API and CLI tools (acli) for automation. Claude Code plugins can integrate with JIRA to look up tickets, transition statuses, and update fields. → Making a Plugin
JSONJavaScript Object Notation. A data format: {"name": "Steve", "age": 42}. Used everywhere for configuration files and API communication.
JSON-RPCJSON Remote Procedure Call. A protocol for calling functions on another process using JSON messages. How MCP servers communicate with Claude.
.jsonl (JSON Lines)A file format where each line is a separate, complete JSON object. Claude Code stores session history in .jsonl files — one event per line, making them easy to read one record at a time without loading the whole file.
JWTJSON Web Token. A compact token format for authentication. Sometimes used by MCP servers or APIs for auth.
keytarA Node.js library for securely storing credentials in your operating system's native keychain (Windows Credential Manager, macOS Keychain, Linux Secret Service). Used by apps to store API tokens without exposing them in plain text files. → Making a Plugin
KVKey-Value. In AI: the “Key-Value” cache — a GPU memory structure that stores attention computations for every token in the context window. The K and V stand for Key and Value vectors — internal mathematical representations the model needs to “remember” what it has read. For every token, at every model layer, two vectors are stored. This is what the context window physically occupies on Anthropic's servers.→ Chapter 15 (Context Window)
KV CacheThe GPU memory structure where every token's attention data is stored during a conversation. Grows linearly with context size. Rough estimate: ~64 KB to ~0.5 MB per token depending on architecture (modern GQA models at the low end, older full multi-head attention models toward the high end). 200K tokens ≈ 100 GB. 1M tokens ≈ 500 GB. Bigger context windows are not a software feature — they reflect real infrastructure cost.→ Chapter 15 (Context Window)
KV headsThe number of Key-Value head pairs in a model. In full multi-head attention: KV heads = total heads. In GQA models: KV heads is much smaller (e.g. 8 vs 64 total). KV heads directly determine KV cache size per token. The correct KV cache formula is: 2 × layers × KV_heads × head_dim × bytes — not total heads.
LatencyThe delay between sending a request and receiving the first response. In AI tools, this usually means the time between submitting your message and seeing the first word appear — sometimes called time-to-first-token (TTFT). A longer context window increases latency because the model must process all prior tokens before it can begin generating output.→ Context Window chapter
Lint / LinterA tool that checks code for style violations, potential bugs, and anti-patterns. ESLint (JavaScript), PSScriptAnalyzer (PowerShell).
LLMLarge Language Model. The AI model that powers Claude, GPT, Gemini, etc. Trained on massive text datasets to understand and generate language and code.
lsLists files in a directory. The Unix equivalent of dir on Windows.
LSPLanguage Server Protocol. A protocol for code intelligence (autocomplete, go-to-definition). Powers VS Code's IntelliSense. MCP is modeled after LSP.
Markdown (.md)A lightweight text formatting language. You write plain text with simple symbols (* for bold, # for headings) and it renders as formatted text. CLAUDE.md, README.md, and SKILL.md files are all Markdown. GitHub displays .md files as formatted pages.
Marketplace (Plugin)A distribution system for Claude Code plugins. A marketplace is a GitHub repository that hosts plugin packages. Users register it with /plugin marketplace add org/repo and then install plugins from it. → Making a Plugin
MCPModel Context Protocol. Anthropic's protocol for connecting external tools (plugins) to Claude. JSON-RPC over stdin/stdout.→ Chapter XI
MCP Server (Plugin)An external process that gives Claude new tools (database queries, API calls). The closest thing to a plugin system.→ Chapter XI
MemoryPersistent notes in ~/.claude/projects/*/memory/. Survives across sessions. Like a developer's project wiki.→ Chapter X
MergeCombining changes from one branch into another. Like accepting tracked changes in a Word document.
Meta-skillA regular skill (SKILL.md file) whose special job is listing all other available skills in a plugin and telling Claude when to use each one. Auto-injected at session start via a hook. Despite the "meta-" prefix, it's structurally identical to any other skill — just a table of contents for the plugin. → Making a Plugin
mkdirMake Directory. Creates a new folder.
/modelSwitches between AI models (Opus, Sonnet, Haiku) mid-session without losing conversation history.
MonorepoA single repository containing multiple projects or teams' code. Requires careful configuration (e.g., subdirectory CLAUDE.md files).
MQA (Multi-Query Attention)The extreme version of GQA where all attention heads share exactly one KV head. Maximum memory efficiency. Slight quality trade-off at very long contexts. Both GQA and MQA exist specifically to shrink the KV cache and make long-context inference affordable.
node / Node.jsJavaScript runtime that runs outside the browser. Required by npm. Most AI CLI tools are Node.js applications.
npmNode Package Manager. Installs JavaScript packages from the npm registry. Used to install Claude Code, Codex, Copilot, Gemini CLIs. Like NuGet for the JavaScript world.
npxRuns an npm package without permanently installing it. npx -y @example/mcp-server downloads and runs the package in one step. Like dotnet tool run.
NVLinkNvidia's GPU-to-GPU interconnect (~900 GB/s on H100s). Required when a session's KV cache exceeds one GPU's VRAM. Allows 12–25 GPUs to share a single massive KV cache. This interconnect advantage, combined with CUDA, is why Nvidia dominates AI infrastructure.
Opus (Claude Opus)Claude's most capable AI model. Best for complex reasoning, architecture decisions, and hard bugs. Slowest and most expensive. Use it when the problem actually needs deep thinking.→ IQ/Awareness chart
OSSOpen Source Software. Software whose source code is publicly available. OpenCode is OSS. Claude Code is not.
Personal Access Token (PAT)A security credential string you generate on a site like GitHub to prove your identity to an API. Starts with ghp_ on GitHub. Has nothing to do with AI text tokens — a PAT is an auth secret, not a billing unit or a unit of text.→ Confusing Terms
Plugin (Claude Code)A program installed into Claude Code that provides slash commands, skills, hooks, and/or agents. Lives on your local disk at ~/.claude/plugins/cache/. Dormant between sessions; wakes up via hooks at session start. Not the same as an MCP server — plugins provide workflows and methodologies; MCP servers expose API tools. → Making a Plugin
Plan / Plan ModeA Claude Code mode activated with Shift+Tab. Claude describes its intended approach before making any changes. You review and approve the plan first. Same discipline as an Architecture Decision Record — describe your approach before touching files.→ Chapter IX
PowerShellMicrosoft's command-line shell and scripting language for Windows. More powerful than the old Command Prompt. The recommended terminal for Windows developers. Like bash, but for Windows.
Prefill phaseThe fast phase where all input tokens are processed in parallel as one batch. What you pay input token prices for ($3/M on Sonnet). Contrast with the generation phase (output tokens), which is slow and sequential.
Premium requestsCopilot's term for using more capable AI models. Not a Claude term — but you'll see it when comparing platforms. Plain term: higher-IQ model requests.
Pull Request (PR)A request to merge your branch into main. Other team members can review the changes before they're accepted. Despite the name, it's actually about pushing your changes for review, not pulling someone else's.
Push / PullPush = upload your local commits to the remote server (GitHub). Pull = download other people's commits from the server to your machine.
python / pipPython programming language and its package manager. Some MCP servers and tools (like SQLite access) use Python. pip install is Python's version of npm install.
QR CodeQuick Response Code. A 2D barcode scannable by phone cameras. Used by Remote Control to quickly connect your phone to a session.
RBAC (Role-Based Access Control)A security model where what you can do is determined by your role (admin, developer, viewer, etc.) rather than by individual permissions. “Developers can use Claude Code; viewers can only use the chat.” Enterprise feature.→ Pricing page
/rcShort for /remote-control. Enables phone/remote access to the current session.
Red-green testingWrite a test that FAILS first (red), then write code that makes it PASS (green). Proves the test actually detects the problem. A core practice in the Practitioner's Guide.→ Practitioner's Guide
RefactorRestructuring code without changing its behavior. Improving organization, readability, or performance.
regexRegular Expression. A pattern-matching syntax for text. ^\d{3}-\d{4}$ matches phone numbers like 555-1234. Used by Claude's Grep tool.
Registry (in code)A data structure that maps keys to behaviors. Avoids if-else chains. Add a new entry to support a new variant.
Ralph LoopAn autonomous AI agent technique where an AI coding assistant is wrapped in a bash loop that feeds its own output back into itself, running unattended until a goal is complete. Pioneered by Geoffrey Huntley. Most effective when combined with a disciplined execution prompt (RUN_PLAN.md) and strong test coverage. → How to Run a Plan
Remote ControlFeature that lets you control a local Claude Code session from your phone or another device via QR code or URL.→ Remote Control tab
Repository (repo / repos)It is a directory. That's it. A repository is just a folder — on your machine, or on a server like GitHub — that has version history tracking turned on. Your local copy is a directory on your hard drive (e.g., C:\repos\MyProject). Repository = versioned directory. Repo = short for repository. Repos = more than one.
RESTRepresentational State Transfer. A common style for web APIs. Uses HTTP methods (GET, POST, PUT, DELETE) on URLs. Most web services expose REST APIs.
--resumeCLI flag to resume a previous session. claude --resume shows a list; claude --resume <id> resumes a specific one.
RunbookA step-by-step procedure document for completing a specific task. “When the server goes down, follow the runbook.” Skills in Claude Code are essentially runbooks — instructions Claude reads and follows when doing that type of task.→ Chapter V
SAMLSecurity Assertion Markup Language. The technical standard that SSO systems use to pass login credentials between the identity provider and applications. You'll see it mentioned alongside SSO — they go together.→ Pricing page
SandboxAn isolated environment that restricts what a program can do. Prevents untrusted code from accessing files or network.
SCIMSystem for Cross-domain Identity Management. Automatically adds users to applications when they join an organization and removes them when they leave — based on your HR system. Eliminates manual account management. Enterprise feature.→ Pricing page
SDKSoftware Development Kit. A library/package that makes it easier to use an API or protocol. Anthropic provides MCP SDKs for TypeScript and Python.
Seat fee (Enterprise)Anthropic's term for the per-person monthly charge for access to Enterprise features (SSO, audit logs, etc.). Does NOT include token consumption — that's billed on top. Plain term: Access fee (tokens extra).
sedStream editor. Transforms text via commands (find/replace, delete lines). Used in scripts for automated text manipulation.
SessionA conversation with Claude Code. Has a unique ID, persists as a .jsonl file, can be resumed later.→ Chapter II
settings.jsonClaude Code's configuration file. Controls permissions, MCP servers, environment variables, model preferences.→ Chapter 15.6
settings.local.jsonRepo-level settings override that is NOT checked into git. For personal credentials and local-only config.→ Chapter 15.6
ShellThe program that interprets your typed commands. On Unix/macOS it's bash or zsh. On Windows it's Command Prompt (cmd.exe) or PowerShell. When someone says "open a shell," they mean open a terminal window. The terms "shell," "terminal," and "command line" are often used interchangeably in casual conversation, though technically the terminal is the window and the shell is the program running inside it.
Shell scriptA text file containing commands that a command-line interpreter executes. On Unix/macOS, these are bash or sh scripts (no file extension or .sh). On Windows, the equivalent is a batch file (.bat or .cmd) or a PowerShell script (.ps1). Claude Code plugins use shell scripts for hooks. Cross-platform plugins often use a polyglot wrapper — a single file that works as both a batch file and a bash script. → Building a Plugin
SkillA SKILL.md instruction file in .claude/skills/. Loaded into context when relevant. Not a running process — a document that informs behavior.→ Chapter V
Slash CommandA custom command defined as a file in .claude/commands/. Typing /foo loads the prompt from that file.→ Chapter VI
SOLIDSingle responsibility, Open/closed, Liskov substitution, Interface segregation, Dependency inversion. Five design principles for maintainable object-oriented code. Useful guidelines, not commandments.→ Practitioner's Guide
Sonnet (Claude Sonnet)Claude's balanced AI model. Good speed, good capability. The recommended default for daily coding work. Better value than Opus for most tasks.→ IQ/Awareness chart
SOP (Standard Operating Procedure)A documented set of instructions for performing a routine task consistently. Similar to a runbook. When you write a Skill for Claude, you're writing an SOP for it to follow.→ Chapter V
sqlite / SQLiteA lightweight database stored as a single file. Codex and OpenCode store session data in SQLite databases. No server needed.
SSHSecure Shell. Lets you connect to a remote machine's command line over an encrypted connection. How you'd access Claude Code running on a remote server. Also the ssh command-line tool.
SSO (Single Sign-On)A system where employees log into one central identity provider (like Microsoft Active Directory or Okta) and that login grants access to all approved applications. No separate username/password for each app. Enterprise feature.→ Pricing page
Static AnalysisExamining code without running it. Linters, type checkers, and pattern scanners are static analysis tools.
stdin / stdoutStandard Input / Standard Output. The default input and output streams of a process. MCP servers communicate by reading stdin and writing stdout.
sudoRuns a command as administrator (root) on Unix/Mac. Like “Run as Administrator” on Windows.
Task ToolBuilt-in todo/task tracker within a Claude Code conversation. Creates, tracks, and completes tasks.→ Chapter 15.2
TLSTransport Layer Security. Encryption for network traffic. The “S” in HTTPS. Remote Control uses TLS to secure messages between your phone and machine.
tmuxTerminal multiplexer. Lets you run a CLI session, detach from it, and reattach later — even from a different machine. Like keeping a program running after you close the window.
TOMLTom's Obvious Minimal Language. Another config file format. Used by Codex for configuration (config.toml).
TokenThe word token has four different meanings in AI discussions, and only three of them are related:

1. (text unit) A chunk of text the model reads and writes. In English, a rough rule of thumb is about 4 characters or ¾ of a word, though the real number varies by language and formatting. 100 words ≈ about 130 tokens as a rough estimate.

2. (billing unit) The unit AI companies charge for. Input and output tokens are often priced separately, because processing them can impose different costs. On Sonnet: $3/MTok input, $15/MTok output.

3. (inference / KV-cache position) During generation, each token in the active context contributes to memory use inside the model’s attention machinery, commonly described in terms of the KV cache. The memory cost per token depends on model architecture — a rough estimate is ~64 KB to ~0.5 MB per token. This is one reason large context windows are expensive to serve. The raw text itself is tiny; what matters is the model’s internal mathematical representation of that text.

4. (security credential) A string used to prove identity to an API — a Personal Access Token (PAT), API key, or JWT. This meaning is completely separate from the other three.→ Pricing page
UACUser Account Control. Windows security feature that prompts before elevated actions. Claude Code's permission model works similarly.→ Chapter 15.5
URI / URLUniform Resource Identifier / Locator. A web address. https://github.com/srives/AIManifesto is a URL.
TypeScriptA superset of JavaScript that adds static type checking. Files end in .ts instead of .js. Catches type errors at compile time rather than at runtime. Used heavily in modern web development and many Claude Code plugins. Like JavaScript with guardrails.
Usage (Anthropic term)Anthropic's term with two unrelated meanings depending on context: (a) how many messages you've sent in the rolling 5-hour window (subscription limits), OR (b) tokens consumed through the API. Plain terms: Messages (subscription) or Tokens (API).
Usage limits (Anthropic term)Anthropic's term for how many messages you can send in a 5-hour rolling window before Claude slows down or stops. Separate from token pricing. Plain term: Message budget.
UTF-8Unicode Transformation Format (8-bit). The standard text encoding. Supports all languages and emoji. What your files should be saved as.
UUIDUniversally Unique Identifier. A random ID like dd8ab84a-a52e-467b-bcb6-0ad3a44a5db6. Used for session IDs. Practically guaranteed to be unique.
Visual StudioMicrosoft's full-featured IDE for Windows. Used primarily for C#, C++, and .NET development. Not the same as VS Code — Visual Studio is the big one, VS Code is the lightweight one.→ Other AI CLIs
VS CodeVisual Studio Code. A free, lightweight code editor by Microsoft. Works on Windows, Mac, Linux. Popular for almost every language. Not the same as Visual Studio — it's smaller and more general-purpose.→ Other AI CLIs
WinGetWindows Package Manager, built into Windows 10/11. Install software with winget install PackageName. Like apt (Linux) or Homebrew (Mac), but for Windows. Use winget install Anthropic.ClaudeCode to install Claude Code.→ Introduction
Working treeGit's name for the folder where your actual files live — the checked-out copy you edit. When you clone a repo, the folder you get is the working tree. Distinct from the hidden .git folder, which stores git's internal history and objects. You work in the working tree; git manages the .git folder.
WorktreeA git feature that lets you check out the same repository into multiple folders simultaneously, each on its own branch. Normally one repo = one folder = one branch at a time. A worktree creates a second (or third) folder from the same repo — same git history, different branch, different files on disk, all live at once. Claude Code uses this to give subagents their own isolated folder to work in, so the agent's changes never touch your files until you choose to merge them.→ Chapter 15.1
WSLWindows Subsystem for Linux. Runs a real Linux environment inside Windows. Many CLI tools work better under WSL. Available via the wsl command.
YAMLYAML Ain't Markup Language. A human-readable data format (like JSON but with indentation instead of braces). Used by Copilot for session config.
=**=---========--------===--=====--==-------== ==*#*==-=====++==-==============-------======= --==-----=+++**++++++##*=*##*%%+=-------===--= ---=-----+#*+=###%%#++++*#%@@@#==-=--==--=---= --==-----=+=====%@@@@@@@%%@@@%*=-----==------= ---====--=-=======*%@@@@@@@@@@#====--========= ----------====+======+%@@@@@@@%+============== ---=---==--==++-=-==+%@@@@@@@@*====+*+=-====== ------=---==++++-====*@@@@@@@@@%+======++===== -----==----=+++======*#@@%@@@@%%+======+====== -=--=------==+++====*%%@@%%%@@%%+============= ----=------==++====+%@@@@@@%%%%%+============= ---=-------==++==*%#%%@@@@@@@@@@%============= -----------==+++#@@@@%%%@@@@@@@@@#=======+++*+ ----------=-=+*%@@@@@@%*+#@@@@%@@@#=======+++= -----=------==%@@@@@@@%**+#@%*+=+%@#=====+##+= *#*+=--------*@@@%@@@@@@%%@@%+====+##*===+##*+ --==#*=----==%@@@@@@@%%@@@@@@*====**%*+===+=== ==--=*#=----=*@@@@@@%%%%@@@@@%+=====+=====+=== =----=*%*=---*%@@@@@@%%%@@@@@%+============+== --==--=+%#=--=#@@@@@@%%#%@@@@*======-========= ---=====+#@%*=+*@@@@@@%%%@@@#+================ ==========+#@@@%@@@@@@%%@@%*================== ---====++===++*#@@@@%%%@@@%*+++=============++
S. Rives
Created March 2026
Confessions →

Code Commenting Practices to Reduce Drift

Comments lie. Not intentionally — they were true when written. But refactors move functions, responsibilities shift, and the comments stay behind describing a structure that no longer exists. The result is code drift: the gap between what comments say the code does and what the code actually does.

AI makes this worse before it makes it better. AI reads comments as instructions. If a file header says "Contains: session launch, fork handling, profile creation" but the refactor moved two of those elsewhere, AI will confidently generate code in the wrong place — because the comment said so.

Why Drift Happens

The drift in a real production project happened for predictable reasons:

The main bug was not "people forgot to update comments." It was "the comment style encouraged drift." When a file header promises to list every function, it will always be wrong. When a file header states what the file owns and what its invariants are, it stays true across refactors.

The Fix: Contract-Oriented File Headers

Replace function inventories with stable contract-style headers. The right format for any file is:

This format is searchable, durable, and meaningful to both humans and AI. It survives refactors because it describes ownership, not contents. Contents change. Ownership changes far less often.

Example: Before and After

Before (inventory style — drifts immediately):

# src/session/launch.ps1
# Contains: New-Session, Fork-Session, Continue-Session, Resolve-Background,
#           Get-GitBranch, Prompt-GitIdentity, Create-WTProfile, Write-PendingMapping

After (contract style — stays true):

# File:      src/session/launch.ps1
# Namespace: session
# Purpose:   Session lifecycle entry points — new, fork, and continue paths.
#            All paths call shared helpers in session/lifecycle.ps1 for artifact
#            creation. No platform logic here — read from PlatformRegistry.
# Notes:     INVARIANT: Every path that creates a WT profile must clean it up
#            in the catch block. Track with $profileCreated flag.

The second version will still be accurate after you move three functions out of the file. The first won't.

Function-Level Comments

Detailed behavior belongs on functions, not file headers. Use your language's native documentation block:

# PowerShell example
function New-Session {
    <#
    .SYNOPSIS
        Starts a new Claude Code session in Windows Terminal.
    .PARAMETER Platform
        Platform key from PlatformRegistry (e.g. 'claude', 'codex').
    .NOTES
        Creates a WT profile and pending mapping. Both are cleaned up
        on failure in the catch block — see $profileCreated flag pattern.
    #>

This way: the file header describes the boundary. The function comment describes the contract. Neither tries to do the other's job.

Rules to Add to CLAUDE.md

Add these to your CLAUDE.md to prevent drift from the start. AI will enforce them with every subsequent code generation:

## Comment Standards

File headers must contain: File, Namespace, Purpose, and optional Notes.
File headers describe ownership and invariants — not a list of every function.
Function inventories in file headers are forbidden unless the file is under 50 lines and stable.

When a refactor changes a file's responsibility, update the file header in the same commit.
When you split a file, both new files get fresh headers that reflect their new, narrower ownership.

Use function .SYNOPSIS blocks (or language-equivalent) for detailed behavior documentation.
File headers describe boundaries. Function blocks describe contracts.

Comments must describe current truth, not historical intent.
If a comment describes something that was true in v3 but isn't in v7, delete it.
History belongs in git log and CHANGELOG.md — not in source code.
This is what code drift actually costs. In one production project, stale headers in core, session, ui, and wt files were contributing to refactor confusion — AI was generating code in the wrong namespace because the comments said that namespace still owned functions it hadn't owned since an earlier version. The comment cleanup pass was not glamorous work. But it directly reduced the cost of every subsequent change. → See the full case study

The Short Version

Case Studies

Real projects, real mistakes, real lessons. Each case study documents what actually happened when AI coding tools were used in production — what worked, what collapsed, and what was learned.

One Saturday morning, too many AI consoles open, and a tool that grew into a codebase of retrofitted architecture. The cost of skipping engineering on day 1 — documented in commits and dollars.
Building a tablet-to-AI screen design pipeline from scratch — and documenting every architecture decision before writing a line of code.
The engineering terms that will kill you if you are unaware of them. Why boundary violations spread while algorithm bugs stay local, and what they look like when AI writes across them.
The dual-AI planning process — competitive planning, critique, and convergence — plus the execution-control prompt that prevents the seven failure modes that kill AI implementations.

Engineering with AI: From Concept to Product

Building from the ground up with AI in mind — and documenting every step of the way

The Idea

I want to use the occasion of a new product as a case study for doing AI-engineering — as opposed to AI-coding (documenting my steps along the way). The product: a tool that lets you draw your UI on a drawing tablet — screens, block diagrams, workflow sketches — and then feed those drawings directly into an AI so it can generate screen layout designs. The specific use case was for adding screens to an existing product, but would be nice to draw screens for a new product from scratch with hand-drawn designs as the AI's input.

The insight that makes this worth a case study is not the product itself. It is the discipline of how it was started. Everything in this story was meant to be done before a single line of code was written (excepting any POC work for proving the technological capabilities).

This is how Case Study 1 should have started. The first case study documents what it costs to skip architecture on day 1 — scores of change sets and a sustained retrofit effort. This case study documents what it looks like to not skip it. Read Case Study 1 →

Pitfall Avoided: Turning Concept Into Product Too Fast

AI could already have a working product built by the time you finish writing this document. That is the trap. You double the time to the first exciting demo, and that takes willpower to accept. But it pays off in every subsequent week.

Vibe coding is a strong pull. It is a dopamine hit. It makes you want to have something running tonight. The pattern in Case Study 1 is exactly this — a Saturday morning of pure velocity that built up a debt that took many change-sets to pay down.

If you cannot contain yourself and absolutely have to see the product come to life tonight, at minimum: narrate what the product is to the AI first, and capture the idea in a document. Do that before you write a single line of code. The idea must exist outside your head before it exists in the codebase.

AI Virtue #1: Patience

Patience is not waiting. It is choosing a slower path now to walk a faster path later. The steps below are all "slow" in the sense that none of them produce a running product. They all produce something more durable: a shared understanding between you and the AI of exactly what you are building and why.

The Process, Step by Step

1. GitHub Repo — Early

Don't talk about it. Do it. Before architecture docs, before POC, before any serious thinking — create the repo. Every decision made after this point is recorded somewhere. This is not optional.

2. Narrate the Idea in a Document

Write down what the product is. Not code — prose. Describe it the way you would describe it to a colleague over lunch. In this case: "I want to draw screens on my drawing tablet, photograph or export them, and feed those drawings into Claude so it can produce screen layout designs I can start coding from." That is the idea. Write it down before anything else.

This step is the design process. The act of writing forces you to make decisions you would otherwise defer. Vague ideas become concrete questions. Concrete questions reveal scope. Scope reveals what you actually need to build.

3. POC — Small Vibe Coding to Prove the Idea Works

At this stage, and only at this stage, vibe coding is appropriate. Write the smallest possible amount of code to answer one question: does this actually work? In this case: can Claude receive a tablet drawing and produce a useful screen layout? If yes, proceed. If no, pivot now before you have written a design document for a product that does not work.

The POC is not production code. It is a disposable experiment. Treat it that way.

4. Draw Your Screens and Block Diagrams

Before writing a single technical requirement, draw every screen. This discipline comes from a long-standing principle: designing from the screens drives every other decision. If the screen design is iterative, you will discover interfaces. You will find data that needs to move between screens. You will find menus and flows you did not know you needed.

Draw on paper if you have to. Then photograph or export and feed them into the AI. You are not just documenting — you are doing design work in the medium that AI can consume.

5. Write the Full Technical Design Document

This step is critical. Document every layer of the system:

Writing this document is not overhead. It is architecture. Everything that would otherwise be decided ad-hoc during coding is decided here, where decisions are cheap and reversible.

6. Multiple AIs Consume the Design and Compete on Architecture

Feed the design document to more than one AI and ask each for an architecture plan. Then have them go back and forth until you reach a consensus. This is competitive planning: two AIs producing proposals, critiquing each other's work, and converging on something better than either would have produced alone.

A practical note: ask each AI which of the two is better suited to implement a given component, and which is better suited to review it. They will give you honest answers, and those answers are useful.

There is a structured process for this. Competitive planning — where two AIs propose, critique, and converge on a final architecture — is powerful, but only if you also know how to hand that final plan to an AI for disciplined execution. The execution step is where plans most often go wrong.

The tool for this is RUN_PLAN.md — an execution-control prompt you place in your repository alongside the plan. It forces the AI through a mandatory preflight before touching a single file: restate the task, read the actual source (not summaries), find every caller of every function it will change, map the blast radius, identify the test surface, and build a risk map. Only then does it implement — one bounded change at a time — followed by targeted tests, a build, and a hostile self-review. It is the difference between an AI that codes from the plan document and one that codes from the codebase.

The full process — competitive planning, dual-AI critique, merging to PLAN_FINAL.md, executing with RUN_PLAN.md, and Ralph Loops for unattended runs — is documented in one place. How to Run a Plan →
As of March 2026: Codex is winning the implementation round more often than not. When the competitive planning process produces a final architecture plan and it is time to execute, Codex has been the better implementer in the majority of real-world tests. Claude tends to plan better; Codex tends to execute more precisely. This is not a permanent verdict — model capabilities shift quickly — but it is today's experience. When you reach Step 12, seriously consider handing the final plan to Codex rather than Claude.

6b. Extract and Approve Constraints Before Code

Before the AI touches a single line of code, there is a critical gate: constraint extraction and approval.

Have the AI write a .constraint-extraction.md file that extracts every architectural constraint, decision, boundary, and assumption from the final plan. This file is not code. It is the explicit statement of "what must be true for this code to be correct."

You review this file. You look for gaps, missing cases, forgotten boundaries. You ask questions. You push back if constraints are unclear. Only when this file is complete and correct do you approve it explicitly.

Only after approval does code begin. If code appears without approved constraint extraction, the commit is rejected and pointed back to the constraint file. This is not optional.

Why does this matter? Architectural work fails when constraints are implicit. They get embedded in code, discovered in review, and require rework. Making them explicit first prevents the entire class of architectural rework. Constraints extracted and approved before code almost never surface as bugs later.

7. Bootstrap from Your Prior Work

Have the AI consume your CLAUDE.md and AGENTS.md from previous projects, and update them to become this project's founding documents. This is a critical step. It takes your engineering discoveries from prior work — every rule that came from something that actually broke — and puts this project on a good footing before a line of code is written.

You are not starting from scratch. You are transferring institutional knowledge.

Don't have a prior CLAUDE.md to start from? Case Study 1 includes a 13-rule architectural template derived from a real production project, with a wizard that generates a customized file in seconds. Fill in your project name, key entity, and runtime — it produces a ready-to-use CLAUDE.md or AGENTS.md. The Day-1 CLAUDE.md / AGENTS.md Builder →

8. AI Designs the Directory Structure

Before any code moves anywhere, have the AI propose a directory structure for both docs and code. Then have a second AI critique it. Directory structure is not a detail — it determines how the project grows. A poor structure that gets committed early becomes a migration cost later.

9. Design for Scale: Local vs. Cloud

Have the AI model both paths: a local-server architecture and a cloud-hosted architecture. This is not about picking one today. It is about making sure the design is not accidentally incompatible with the path you will want later. Decisions made here are cheap. The same decisions made at v5 cost retrofits.

10. Move POC Code Into the Proper Structure

Now, and only now, the POC code moves from its throwaway location into the real directory structure. The AI does the move. The move is mechanical — the structure was designed in step 8. Nothing is rewritten yet. The goal is to have working code in the right place before any new features are added.

11. Testing Plan, Tests, and a Test Launcher — Before Features

Have both AIs come to a testing plan together. Then have them write the tests. Then have them write a test launcher so that tests run automatically with every change.

Insist on red-green testing from this point forward. Every step of the way. Every new feature must come with new tests. This is not optional. Case Study 1 documents what it costs when test infrastructure is added after the fact. Do not repeat that.

12. Implement One Small Feature, Test, Check In

Now you are ready to build. Take the smallest complete feature from the master plan. Implement it. Run the tests. Check in. Repeat.

The velocity from here feels slower than pure vibe coding. But it compounds. Features land without breaking previous features. The AI has a shared understanding of the architecture. Additions fit because the structure was designed to accept them.

Execute each feature using RUN_PLAN.md as your constraint. Write a feature-specific plan (e.g., add-user-authentication.md), then hand it to the AI with this command:

Execute add-user-authentication.md using RUN_PLAN.md as your
rule and constraint. Complete Phase 0 preflight first, then
implement one bounded change at a time. Run tests after each change.

After each feature lands, do a hostile code review to catch local bugs, and a tough review using PR_TOUGH.md to catch architectural drift:

Do a hostile review of add-user-authentication work, then a
tough architectural review using PR_TOUGH.md. Output both to
PR_AUTH_FINDINGS.md. Block merge if CRITICAL.

What This Avoids

Case Study 1 is the counter-example. Every step above corresponds to something that was skipped there and had to be paid for in v9:

Skipped in Case Study 1Paid for in v9Done upfront here
No registry-driven architectureScattered variant logic across 7 filesSteps 6–8: AI designs architecture before code
No canonical helpersDuplicated logic in every entry pathStep 8: structure designed for shared utilities
No atomic writesSilent data corruption on crashStep 5: design doc captures all I/O contracts
No test infrastructureFalse-passing tests, latent bugsStep 11: testing plan before first feature
No directory structure planPOC code scattered permanentlySteps 8–10: structure designed and enforced first

The One-Sentence Version

Do all the design work first, let the AI do it with you, and then build — because the AI will build exactly what you designed, and if you did not design it, the AI will invent it.

Code Review Methodology: The Hostile Review

Once your product is being built, features are landing, and tests are passing — how do you know the code is actually correct? Passing tests and successful builds are evidence, not proof. This is where a rigorous, AI-driven code review process becomes essential.

The methodology below is called a hostile code review. The word "hostile" is deliberate. You are not asking the AI to summarize what the code does. You are telling it to assume there are bugs and hunt for them. This is the difference between a review that confirms your expectations and a review that challenges them.

A review that only finds style issues is not a review. If your AI code review comes back with "consider renaming this variable" and nothing else, you have not done a review — you have done a spell check. A real review traces data flow, catches null paths, and finds the bugs that tests missed.

The Structure of a Hostile Review Prompt

A hostile review prompt has five parts, each serving a distinct purpose. Skip any one of them and the AI will fill the gap with generic advice instead of real findings.

1. Mindset. Tell the AI to assume there are bugs. Tell it not to waste time on style, naming, or refactoring unless they hide a real defect. Tell it to be skeptical of passing tests. This sets the tone for the entire review.

2. Scope. Define exactly which slice of the codebase to review. Name the feature, the files, and the boundaries. If you do not scope the review, the AI will wander into unrelated code and dilute its findings with irrelevant observations.

3. Known Incompletes. Tell the AI what is deliberately unfinished. Without this, half the findings will be "this feature is not complete" — which you already know. The rule is: do NOT report these as bugs unless the current code incorrectly pretends they are complete.

4. Trace Paths. Tell the AI exactly what to trace. A code review that reads files in isolation misses wiring bugs. The AI must follow the full path: UI event → handler → HTTP request → controller → service → data layer → response. Every hop. Every assumption.

5. Output Format. Require structured output: findings ordered by severity, each with a file/line reference, a description of the actual failure mode, and an explicit confidence level. Require a separate section for residual risks and test gaps. If there are no findings, require the AI to say exactly "No findings" — no padding, no summaries of what the code does.

Sample Prompt: Hostile Code Review

Below is a real prompt used to review a feature slice in a production codebase. It follows all five parts of the structure above. You can adapt this template to any feature in any codebase.

You are doing a hostile code review of the current [feature] slice.

Mindset:
- Assume there are bugs.
- Hunt for correctness issues, regressions, bad assumptions,
  broken wiring, hidden runtime failures, invalid semantics,
  missing guards, and missing tests.
- Be skeptical of passing tests and successful builds.
  They are evidence, not proof.
- Do not waste time on style, naming, or refactoring
  unless they hide a real defect.
- Do not review unrelated repo churn. Stay on the [feature]
  slice only.

Scope:
Review only the current [feature] work:
- [component 1]
- [component 2]
- [component 3]
- tests added for this slice

Known incomplete by design (do NOT report as bugs unless
the current code incorrectly pretends they are complete):
- [known incomplete item 1]
- [known incomplete item 2]
- [known incomplete item 3]

Files to inspect at minimum:
- [path/to/file1]
- [path/to/file2]
- [path/to/file3]

Context files to compare against:
- [path/to/spec_or_design_doc]
- [path/to/sample_data]

You must manually trace the full path:
1. UI event -> handler -> HTTP request shape
2. Controller action -> service call -> response
3. Service method -> data builder -> output shape
4. Data source -> field mapping -> fallback behavior
5. Tests -> what is covered vs. what is not

You must actively look for:
- endpoint/route mismatches
- wrong HTTP verb or response type assumptions
- null-path runtime exceptions
- invalid output shape relative to spec/samples
- unsafe assumptions about data always existing
- UI errors when server returns error responses
- cases where malformed input slips through
- missing tests for real failure paths

Run these commands:
- [test command for this slice]
- [build command]

If useful, inspect relevant diffs and current file contents,
but do not modify code.

Output format:
1. Findings
- Findings first, ordered by severity.
- Each finding must include:
  - severity: Critical / High / Medium / Low
  - a one-line title
  - exact file and line reference
  - why it is a real bug/risk now
  - the concrete failure mode or reproduction path
- If you are unsure, say so explicitly and explain
  what evidence would confirm it.

2. Residual Risks
- Only risks that are real and current.
- Separate these from confirmed findings.

3. Test Gaps
- Only high-value missing tests for this slice.

Rules:
- If there are no findings, say exactly "No findings."
- Do not pad the answer with summaries of what the code does.
- Do not recommend future roadmap work unless it exposes
  a current defect.
- Do not confuse "unfinished by design" with "broken now."
- Be strict.

Why This Works

This prompt structure works because it eliminates the three failure modes of AI code review:

Run a hostile review after every feature lands, before you merge. The five minutes it takes to write the prompt will save you the hours it takes to debug the bug it would have caught.

Architectural Code Review with PR_TOUGH.md

The hostile review above catches bugs in individual features. But code can be locally correct and architecturally wrong: canonical truth split across files, boundaries violated, caches treated as authoritative, old patterns preserved under new names.

PR_TOUGH.md is a framework for catching that class of drift. It defines nine categories of architectural violation specific to your documented architecture, and structures reviews to hunt for them systematically. Copy PR_TOUGH.md from the root of this repository into your project, customize the nine categories to match your architecture, and use it to review changes that touch core boundaries.

When to Use PR_TOUGH.md

Use PR_TOUGH.md instead of (or in addition to) hostile review when:

Sample Prompts

For a specific feature review:

Do an architectural PR review using PR_TOUGH.md as your constraint.
Review the current state of the [feature-name] work.
Put the output in PR_TOUGH_FINDINGS.md, overwriting if it exists.
Start with findings ordered by severity.

For review immediately after code execution:

You just executed my-feature.md. Now do a tough review using PR_TOUGH.md.
Review [file/directory] for architectural drift against AGENTS.md.
Output to PR_FINDINGS.md.
If CRITICAL, surface it immediately.

For cleanup verification:

Using PR_TOUGH.md, review whether the cleanup claimed in the
commit message actually happened in the code. Check that old
patterns are really gone, not just renamed.
Output to PR_CLEANUP_VERIFICATION.md.

Falling Between the Cracks: The Engineering Terms That Will Kill You

Why boundary violations spread while algorithm bugs stay local, and the five terms that separate muddy architectures from clean ones

The Real Cost of Long-Lived Codebases

Most software pain is not "the loop was wrong" or "the API call failed." Long-lived codebase suffering is almost always:

Algorithm bugs are local. You fix a loop and one feature works. Boundary bugs spread. You fix canonical truth in one place and six other places still have the wrong copy. That is why the pain feels sticky.

Five Terms That Matter More Than You Think

These are not jargon for its own sake. They are labels for the failure modes that keep costing you time. Understand them and you can prevent entire categories of problems. Ignore them and AI will write code that violates each one, and you will spend months cleaning it up.

1. Namespace — Where Code Lives

Definition: The directory or module where a piece of code belongs. Namespace is about ownership and responsibility, not just file location.

Example: src/session/ owns session lifecycle. src/ui/ owns rendering. src/wt/ owns Windows Terminal integration.

The violation: UI code writing canonical session records. WT code mutating session state directly. Session code doing rendering.

Why it matters: When responsibilities leak across namespaces, you cannot change one part without breaking another. A small refactor becomes an excavation.

2. Boundary — Who May Do What

Definition: The rule about what is allowed to cross between namespaces. A boundary says "this side may call that side, but not vice versa" or "data may flow this direction, but not that one."

Example: UI may call session service. Session service must not call UI. Temporary launch data may not become canonical session state.

The violation: Session code calling back into UI. Temporary state persisting as if it were durable. A boundary that exists in docs but not in code.

Why it matters: Boundaries prevent circular dependencies and make it safe to refactor one side without touching the other. A muddy boundary means you cannot refactor either side without risking silent breakage.

3. Contract — What Is Promised at the Boundary

Definition: The explicit agreement about what will happen when you call a function or cross a boundary. Shape of data, behavior on success and failure, invariants that remain true, side effects that will not happen.

Example: Resolve-Launch returns success only after uniqueness is proven. Launching twice with the same parameters returns the same session key. Canonical records cannot contain temporary fields.

The violation: A function that sometimes writes state, sometimes doesn't, depending on context. Data shapes that change between callers. Success that does not guarantee the promised invariant actually holds.

Why it matters: A broken contract means the caller cannot trust the result. If the contract is ambiguous, tests pass but code breaks in production. If contracts are enforced everywhere, you can reason about the system.

4. Entry Point — Where a Flow Starts

Definition: The specific function or file where a behavior begins. Not just "we handle launches" but "launches begin at Start-ManagedSessionLaunch, and that is the only place where a launch can be initiated."

Example: User launching a session always goes through Start-ManagedSessionLaunch. Platform discovery always goes through Resolve-PlatformSession. Canonical records always created by New-CanonicalRecord.

The violation: Multiple functions that both launch sessions. Three different ways to discover platforms. Canonical records created inline in five places.

Why it matters: If there are multiple entry points, each one becomes a place where assumptions can diverge. One path creates records with field A, another with field B. One path validates, another does not. The more entry points, the more places the contract can break.

5. Seam — The Narrow Handoff Where Behavior Can Be Swapped

Definition: The intentionally narrow place where one responsibility hands off to another, or where behavior can be swapped out, tested in isolation, or refactored without touching anything else.

Example: "Platform-targeted launch resolution" is a seam. The overall launch flow is the same for all platforms. Each platform implements its own targeted query behind the same flow. You can swap the platform query logic without touching the flow.

The violation: The handoff between session behavior and WT transport is not cleanly isolated. Some behavior lives in the wrong place. The old code and new code coexist, and you cannot tell which path runs when.

Why it matters: A good seam lets you replace one part without corrupting others. A muddy seam means every change carries the risk of invisible breakage. Clean seams make testing easier (you can test each side independently) and refactoring safer (you know the boundary will hold).

The Damage Pattern: How These Terms Break Down

In a real codebase with real pain, the failure pattern is almost always this sequence:

Step 1: One concern starts writing another concern's state

UI code creates session records because it is faster than calling the session service. Boundary violation.

Step 2: Temporary state gets treated like canonical state

A launch record that was meant to be transient gets persisted. Now it exists in two places: the canonical location and the temp location. Contract violation.

Step 3: Helper placement blurs ownership

A helper function that should be in namespace A gets placed in namespace B because that is where it is called. Now nobody knows who owns it. Namespace violation.

Step 4: Old and new paths coexist

You try to fix the mess by adding a new path that does it right. But the old path is still there. Now the same behavior has two entry points. Neither is definitive. Both encode different assumptions. Entry point violation.

Step 5: Cleanup logic cannot tell what is safe to delete

You want to retire the old path. But you cannot tell if other code still depends on it. The seam is too muddy. So you leave it. The mess compounds.

This is the cycle that keeps killing long-lived codebases. It is not that the code is badly written. It is that these five boundaries eroded, and now you cannot tell what is safe to change.

How AI Makes This Worse

AI violates these categories with remarkable consistency, especially when unaware of them:

What To Do About It

Before AI writes code: Tell it the terms. Define your namespaces, boundaries, and contracts explicitly in CLAUDE.md. Name your entry points. Identify your seams. The clearer you are, the better the AI respects them.

During code review: Use these terms in your review. Do not just say "this looks wrong." Say "this violates the boundary between session and UI" or "this creates a second entry point that will diverge." Be specific about which term is broken.

When the AI proposes helper placement: Ask whether it belongs in the namespace where it is defined, or whether it is masquerading as something it is not.

When cleanup is claimed: Verify that old paths are actually gone, not just renamed. Use the five terms as your checklist.

The antidote to boundary chaos is relentless clarity on these five terms. Write them down. Use them in conversation with the AI. Reference them in code review. When you see a violation, name it using these terms, not vague criticism. The moment these boundaries become part of your team's vocabulary, the pain of boundary bugs becomes visible — and preventable.

Future Possibility: Better Than Faster

Today's AI behaves like a very fast implementer, a mediocre architect, and an inconsistent custodian of invariants. It is strong at local tasks: code generation, pattern matching, mechanical refactors, filling in implementation once the shape is clear. It is much weaker at the architectural work: preserving intent across many files, killing old paths instead of extending them, noticing that a "working" shortcut violates a boundary, maintaining discipline when a hack is locally convenient.

The result is useful—faster coding than humans can do by hand. But it also creates a cleanup tax: after implementation, you spend weeks catching boundary violations that were locally convenient at the time, removing old paths that the AI preserved "just in case," and enforcing invariants that the AI ignored when they would have slowed down the code generation.

What Current AI Excels At

What Current AI Struggles With

What Would Change Everything

The jump in usefulness would not come from being faster at coding. It would come from being better at architecture:

With these shifts, the cleanup tax would drop dramatically. Right now, 30–40% of the work after AI implementation is catching and fixing boundary violations, removing preserved old paths, and enforcing forgotten invariants. If AI actively protected those boundaries instead of just respecting them when told, the post-implementation work would shrink to testing, optimization, and integration. That would be a real jump in usefulness—not just faster coding, but cleaner coding that requires less cleanup.

That is not here yet. But it is the gap to watch for in future systems. When you evaluate a new AI tool, ask not "how fast can it code?" but "how well does it understand and enforce my architecture?" Speed is commoditized. Architectural discipline is still rare.

Retrospective: The Real Skill Is Not Writing Rules

It seems the problem is not adding enough guardrails and `.md` rules. The real problem is that you yourself need to know these ideas. The rules help, but only if you can recognize when the model is violating the underlying concept. Otherwise, the model can appear compliant while still drifting.

This is the hard truth: the durable skill is not "write more rules." It is being able to spot the violation yourself.

Can you see when code is in the wrong namespace? Can you spot a boundary leaking across layers? Can you tell when temporary state is being treated as canonical authority? Can you recognize when a contract has gone ambiguous? Can you tell which old paths should have died instead of being preserved?

Once you develop that eye, the documentation becomes a sharper tool instead of fragile hope. Instead of hoping the AI internalized the rules, you actively catch violations as they happen and call them by their true name:

That feedback is much stronger than handing the AI a rule file and trusting it to internalize the philosophy. It is real time, specific, and tied to the actual code.

The documents matter. They give you a language and a framework. They help you think clearly. But they are not a substitute for developing your own ability to see the patterns. The moment you can recognize a namespace violation or a boundary leak without consulting a rule, the documents stop being your guardrails and become your tools.

That is why these five terms—namespace, boundary, contract, entry point, seam—are worth internalizing. Not because they are clever terminology, but because once you see them, you cannot unsee them. And once you cannot unsee them, you can protect your architecture in real time instead of cleaning up drift afterward.

How to Run a Plan

From competitive planning through disciplined execution — a repeatable process that works

A plan without an execution protocol is a wish list. Most developers write a good plan and then hand it to an AI with "implement this." The AI immediately starts coding from the document instead of the repo, patches the wrong seam, misses callers, and skips validation. The result looks done until it breaks in production. This page documents the full process: how to build a better plan using two AIs, and how to execute it without drifting.
I

Phase 1: Competitive Planning with Two AIs

The insight behind competitive planning is simple: two independent perspectives on the same problem produce a better plan than one. When you describe the same problem to two different AIs, their proposals will diverge on assumptions, trade-offs, and architecture choices — and those divergences tell you exactly where the hard decisions are.

1
Describe the problem independently to both AIs

Write one problem statement — a clear, scoped description of what you need to build or change. Feed it to both AIs in separate sessions with no shared context. You are looking for two independent proposals, not a collaboration. Keep it concrete: what is the goal, what are the constraints, what must not change.

2
Each AI produces a plan

Ask each AI to write a structured implementation plan: phases, files to touch, functions to create or modify, test strategy, risks. The output should be specific enough that another AI could execute it without guessing. Save the results as PLAN_A.md and PLAN_B.md.

3
Each AI critiques the other's plan

Cross-pollinate. Show AI A the plan that AI B produced, and vice versa. Ask each to identify wrong assumptions, missed risks, and where their own approach is stronger — then produce a revised plan incorporating what was right in both. Save as PLAN_A2.md and PLAN_B2.md.

4
Merge into a final plan

Pick one AI to do the merge — the one whose second-round plan was stronger. Hand it both PLAN_A2.md and PLAN_B2.md and ask it to produce PLAN_FINAL.md: a single, definitive plan that incorporates the best elements of both and resolves any remaining conflicts.

Both sessions should read your CLAUDE.md / AGENTS.md first. Your architectural rules and hard-won constraints should inform both plans from the start. A plan that ignores your existing architecture is a rewrite in disguise.
Like two architects reviewing each other's blueprints before construction starts. Neither blueprint is the final design. Both architects come back with a better version after seeing where the other made assumptions they did not. The building that gets built is better than either original proposal.
As of March 2026: Codex is winning the implementation round more often than not. When the competitive planning process produces a final plan and it is time to execute, Codex has been the better implementer in the majority of real-world tests. Claude tends to plan and reason more broadly; Codex tends to execute with more precision and fewer side excursions. This is not a permanent verdict — model capabilities shift quickly — but it is today's experience. When you reach execution, seriously consider handing the final plan to Codex rather than Claude.
II

Phase 2: Executing the Plan with Engineering Rigor

Writing a good plan is half the work. The other half is handing it to an AI with an execution protocol that prevents the failure modes that kill AI implementations. Without a protocol, even a great plan gets ruined by an AI that codes from the document instead of the codebase.

The 6 Failure Modes of Unstructured Plan Execution

1 Plan is stale. The plan names a function that moved or changed shape. The AI patches the wrong seam and never notices.
2 Blast radius missed. The plan names the function but not its callers. The AI changes the contract and leaves every caller broken.
3 Helper duplication. The AI creates a new helper instead of using the one that already exists. Now there are two versions of the same logic.
4 Data shape corruption. The AI mutates a data structure by rebuilding a subset of its fields, silently dropping everything else.
5 Rubber-stamp tests. When a test fails, the AI changes the test to match the new (wrong) output instead of fixing the code.
6 No proof. The AI declares "done" without running tests or a build. Nothing was actually validated.

The RUN_PLAN.md template in Chapter III is an execution-control prompt that forces the AI to address each failure mode before writing a single line of code.

How to Use It

1 Copy RUN_PLAN.md into your repository and adapt the project-specific sections (file paths, build commands, architecture boundaries).
2 At the start of your execution session, hand the AI both the plan file and RUN_PLAN.md.
3 Tell the AI: Read RUN_PLAN.md, then execute PLAN_FINAL.md.
4 Require the AI to complete Phase 0 (Preflight) and write its findings before touching any code.
5 Review the preflight. If the plan is stale or the blast radius is larger than expected, adjust before proceeding.

Sample Execution Prompts

Basic plan execution:

Execute my-feature.md using RUN_PLAN.md as your constraint.
Complete the preflight first, then proceed with implementation.

Plan execution with architectural review:

Execute refactor-auth.md using RUN_PLAN.md.
After implementation, do a tough review using PR_TOUGH.md
and output findings to PR_REFACTOR_FINDINGS.md.

High-risk execution with mandatory preflight review:

Execute the plan using RUN_PLAN.md.
Write the full Phase 0 preflight to PREFLIGHT.md
and wait for my approval before touching any code.
Do not proceed without explicit go/no-go from me.
Require a preflight report before implementation on any risky change. For routine work, you can let the AI proceed after a brief Phase 0 summary. For anything that touches shared data, external contracts, or critical paths, require a full written preflight before a single file is edited. The report makes the AI's assumptions visible — and reviewable — before the blast radius is activated.
III

The RUN_PLAN.md Template

Below is a generic RUN_PLAN.md you can copy into your project and adapt. The project-specific sections (file paths, build commands, architecture boundaries) are clearly marked for replacement. Everything else applies universally.

RUN_PLAN.md — Generic Template
# RUN_PLAN.md

Use this prompt when handing an implementation plan to an AI session.
The plan tells it what to change. This document tells it how to execute
with engineering rigor.

This is not a creativity prompt. It is an execution-control prompt.

---

## Execution Prompt

```text
You are executing an implementation plan. Read AGENTS.md (or CLAUDE.md)
first. These files are law and override anything in the plan that
conflicts with them.

Then read the plan file: [PLAN_FILE_PATH]

Do not start coding until you complete the preflight below.

Your job is to make the smallest correct change that:
- still matches the current codebase, not just the plan
- respects the project's architectural boundaries
- preserves data and state coherence
- avoids hidden side effects
- lands with proof through tests and validation

---

### Phase 0: Preflight (No Code Changes)

Before touching any file, do all of the following:

1. Restate the task precisely.
   - Requested outcome
   - Scope (which files, layers, or subsystems)
   - Non-goals (what you will NOT change)
   - Assumptions

2. Read the actual source, not summaries.
   - Read every file the plan names.
   - Confirm the code still matches the plan.
   - If a function moved, changed shape, or was already partially fixed,
     say so before proceeding.

3. Identify the owner and seam.
   - Name the canonical helper or module that owns the behavior.
   - Search before writing. If a helper exists, use it.
   - If no helper exists and the pattern is durable, create it in the
     correct location before using it.

4. Identify the blast radius.
   - For every function you will modify, find every caller.
   - If the contract changes in any way -- parameters, return shape,
     side effects, or behavior -- all callers are in scope for
     validation, even if the plan did not mention them.

5. Identify the test surface.
   - Find existing tests that cover the touched behavior.
   - If no test exists for a new guardrail or transition, plan to add one.
   - If an existing test asserts old behavior that the change replaces,
     plan to update it to the new contract.

6. Build a risk map. Explicitly assess:
   - Architecture and boundary violations
   - Data shape and contract drift
   - Null paths and error propagation
   - Silent failures (skipped items, swallowed errors)
   - Test coverage gaps

7. State the execution order before coding.
   - List the changes in order.
   - Name the file and function for each.
   - If the operator asked for preflight-only, stop here and report.

---

### Phase 1: Implement (One Bounded Change At A Time)

For each item in the execution order:

Before editing:
- Re-read the function you are about to modify.
- Re-read its callers if you are changing its contract.
- Re-read the tests that cover it.

While editing:
- Change the minimum necessary for this item.
- Preserve existing architectural boundaries.
- Do not duplicate helpers or create a second version of an existing
  utility.
- Do not refactor surrounding code unless it is directly required for
  correctness.
- Preserve full object/document shape when mutating data; do not
  rebuild a subset and silently drop fields.

After each item:
- State what changed, in what file and function, and what the
  behavioral difference is.
- If the contract changed, list the callers and confirm they were
  updated or verified.
- Do not move to the next item until this one is internally consistent.

---

### Phase 2: Validate

1. Run the narrowest relevant tests first.
   - Start with the test file or block closest to the changed behavior.
   - Then run the broader test suite for the touched module.

2. Run build validation.
   - [PROJECT-SPECIFIC: insert your build command here]
   - Do not skip build validation because targeted tests passed.

3. Re-check callers and regressions.
   - Grep for every modified function and confirm callers are correct.
   - Confirm no caller is now passing wrong arguments or relying on
     behavior that changed.

4. State manual validation truthfully.
   - If behavior crosses environment-dependent boundaries, state exactly
     what was and was not validated manually.

---

### Phase 3: Self-Review

Before declaring done, review your changes against these defect classes:

1. Architecture drift
   - Did you bypass a helper or cross a boundary?
   - Did you put logic in the wrong layer?

2. Data shape and contract drift
   - Did you create another fallback chain instead of normalizing at
     the boundary?
   - Did you strip fields when mutating an object?

3. Error propagation
   - Can an error escape without proper handling?
   - Did any catch path introduce a secondary failure?

4. Silent failures
   - Did you use continue/skip without warning the caller?
   - Did you return an empty result where an error would be more honest?

5. Test gaps
   - Did you add or change behavior without a test?
   - Did you update tests to rubber-stamp new code instead of asserting
     the real contract?

If you find a real problem in self-review and it is in scope and
low-risk, fix it before reporting. Otherwise call it out explicitly.

---

### Stop Conditions

Stop and surface the issue instead of improvising if:
- The plan no longer matches the code in a material way.
- The correct owner or seam is unclear.
- The change requires guessing about ownership or contracts.
- Validation cannot establish that the change is safe.
- The requested change conflicts with AGENTS.md rules.

When blocked, explain the minimum missing fact or decision needed to
proceed. Do not hide uncertainty behind confident wording.

---

### Completion Report

When done, provide:

1. Understanding
   - Requested outcome, scope, non-goals, assumptions

2. Preflight Findings
   - Current owner/helper/seam
   - Files and callers in scope
   - Main risks identified

3. Execution Notes
   - What changed
   - Any deviations from the original plan and why
   - Any callers updated or verified

4. Validation
   - Exact tests or commands run
   - Build status
   - Manual validation performed
   - What remains unvalidated

5. Residual Risks
   - Only real remaining risks, not generic caution text

6. Closeout
   - Is the requested change done?
   - Is follow-up work needed?
   - Is the work ready for review?

Do not commit. Report completion and wait for explicit commit instruction.
```

---

## Short Prompt

Use this when you want a single-paragraph handoff instead of the full
prompt:

```text
Execute the implementation plan in [PLAN_FILE_PATH] with engineering
rigor. Read AGENTS.md (or CLAUDE.md) first. Before writing any code,
restate the outcome, scope, non-goals, assumptions, owners, seams,
callers, tests, and risks. Read the actual files the plan names and
confirm the plan still matches the code. Search before writing. Use
existing helpers. Preserve architecture and data shape. Implement one
bounded change at a time. Verify callers after contract changes. Run
targeted tests, then build validation. Self-review against the defect
classes. Report changes, validation, and residual risks honestly.
Do not commit without explicit permission.
```

---

## Template You Can Hand to an AI

```text
Task: <one-sentence requested outcome>
Plan file: [PLAN_FILE_PATH]

Read AGENTS.md (or CLAUDE.md) first.

Before coding, produce:
1. Understanding -- outcome, scope, non-goals, assumptions
2. Preflight -- files read, owner/seam, callers, tests, risks
3. Execution Order -- ordered steps, acceptance criteria, validation plan

Then implement conservatively:
- search before writing
- use existing helpers
- preserve architecture and data shape
- add or update tests for changed behavior

After implementation, report:
- what changed, callers verified
- tests and commands run
- build status
- manual validation performed
- what remains unvalidated
- residual risks
```

---

## Engineering Rigor Checklist

Use this to judge whether your plan is strong enough before execution.

| Discipline    | Strong                                              | Weak                                                |
|---------------|-----------------------------------------------------|-----------------------------------------------------|
| Objective     | States what must change AND what must not. Defines acceptance criteria. | Mixes bug fix, refactor, and feature into one blur. |
| Preflight     | AI reads actual files first. Assumes plan may be stale. Forces caller discovery. | Lets AI code from the document without checking the repo. |
| Architecture  | Names owning module/service. Enforces helper-first reuse. | Says "update whatever is needed."                   |
| Data/State    | Names data structures and contracts. Treats shape divergence as a defect. | Assumes data is straightforward.                    |
| Safety        | Maps errors to correct responses. Does not silence failures. | Catches generic errors. Skips items without warning. |
| Validation    | Targeted tests first, then build. Tests error and multi-item paths. | Says "run tests" without naming what proves correctness. |
| Reporting     | Concrete, falsifiable. Records what was and was not validated. | Reports success without evidence.                   |

---

## Unplanned Work

This document works without a plan file. For bug fixes, one-off tasks,
or any work where you don't have a formal plan, substitute the bug
description or task description for [PLAN_FILE_PATH] and skip the
"confirm plan matches code" step -- everything else applies. The
preflight still forces you to find the owner, callers, blast radius,
tests, and data shape before coding. The implementation, validation,
and self-review phases are identical. Use the Template section with:

  Task: fix <description>
  No plan file. Use RUN_PLAN.md methodology.

---

## Why This Exists

AI sessions executing plans fail in repeatable ways:

1. The plan is stale and the AI patches the wrong method.
2. The plan names the function but not the callers -- blast radius missed.
3. The AI creates a new helper instead of using the one that exists.
4. The AI rebuilds a subset of fields and silently drops the rest.
5. The AI updates tests to rubber-stamp new behavior instead of the contract.
6. The AI fixes the symptom but not the root cause.
7. The AI says "done" without proving the change.

Every phase in this template exists to block one or more of those
failure modes.
IV

Using RUN_PLAN.md for Unplanned Work

RUN_PLAN.md is not only for formal implementation plans. It works equally well for bug fixes, one-off tasks, and any work where a plan was never written. The key insight: the value is in the preflight and validation discipline, not in the plan file itself.

Instead of
Execute the plan in PLAN_FINAL.md
Use
Task: fix [description].
No plan file. Use RUN_PLAN.md methodology.

Skip only the "confirm plan matches code" step from Phase 0 — that step requires a plan document to compare against. Everything else is identical: find the owner and seam, map the blast radius, identify the test surface, build a risk map, implement one bounded change at a time, validate, self-review.

The preflight catches more bugs than the fix does. When you force the AI to name the owner, list every caller, and build a risk map before touching a line of code, it routinely discovers that the bug is not where it appeared to be — or that the obvious fix would break a caller that was never mentioned. The plan document is an input to Phase 0, not the point of Phase 0. Without a plan, Phase 0 is still doing its job.
Unplanned work is often riskier than planned work

A formal plan was written with the codebase in mind and has been reviewed. An unplanned bug fix is improvised — the scope is unclear, the blast radius is unknown, and there is pressure to move fast. That is exactly when the preflight step earns its keep.

Create a slash command called /fix in .claude/commands/fix.md with the contents: "We are fixing a bug. Before touching any file: restate the bug, find the owner and seam, list every caller of any function you will change, identify the test surface, and build a risk map. Then implement the smallest correct change. Validate with targeted tests and a build. Self-review for silent failures, data shape drift, and test gaps. Do not commit without explicit instruction." One command loads the full discipline for any unplanned fix — no template to remember, no steps to skip.
Like an emergency maintenance procedure at a power plant. The written procedure exists for planned shutdowns. But when the emergency happens at 2 AM, the crew still follows the checklist — they do not improvise because there is no time. The checklist is fastest when the pressure is highest, because improvisation under pressure is where catastrophic mistakes happen. RUN_PLAN.md is that checklist for unplanned AI work.
V

Ralph Loops: Autonomous Unattended Execution

Everything covered so far assumes a human is in the loop — reviewing the preflight, approving each phase, reading the completion report. A Ralph Loop removes that assumption. It is a technique for wrapping an AI coding assistant in a bash loop that feeds its own output back into itself, running unattended until a goal is complete.

The name comes from Geoffrey Huntley, who pioneered the pattern with Claude Code. The loop runs something like this: feed the AI a PRD (product requirements document) or a plan file, let it work, capture the output, feed it back in as the next prompt, repeat until the task list is exhausted or the AI reports completion. No human sitting there approving each step.

What a Ralph Loop looks like
while ! task_complete; do
  output=$(claude --print "$(cat PROMPT.md)" 2>&1)
  echo "$output" >> session.log
  update_prompt "$output"
done

The AI's output on each iteration becomes input for the next. The loop drives itself forward until done.

Where Ralph Loops Fit in This Page's Framework

A Ralph Loop is the execution layer taken to its logical extreme. RUN_PLAN.md is the discipline that makes a Ralph Loop safe rather than chaotic. Without it, the loop will happily execute the wrong seam, miss callers, skip tests, and declare completion on each iteration — with no human noticing until the damage accumulates. With it, each iteration begins with a preflight, implements one bounded change, validates, and reports before the loop feeds the next iteration.

The PRD is the plan file. In a Ralph Loop, the product requirements document plays the role of PLAN_FINAL.md. It defines the goal. RUN_PLAN.md defines how each iteration of the loop executes toward that goal. Together they turn an unattended loop into something with engineering discipline baked in.

When to Use a Ralph Loop

Ralph Loops are most effective when:

  • The task is well-defined in a PRD and the acceptance criteria are unambiguous
  • The codebase has strong test coverage — tests are the only feedback mechanism when no human is watching
  • The work decomposes into independent steps that each produce a verifiable, committable result
  • You have a build and test command that exits non-zero on failure, giving the loop a clean stop signal
A Ralph Loop on a poorly tested codebase is a confidence machine, not a delivery machine. It will produce output. The output will look done. Without tests to verify correctness on each iteration, there is no signal distinguishing "working" from "plausible." The loop completes. The code ships. The bugs were always there.

The Human's Role in a Ralph Loop

Unattended does not mean unsupervised. The human sets up the loop, writes the PRD with precision, defines the stop condition, and reviews the session log after each run. The loop is an accelerator, not an abdication. Think of it as overnight compilation: you set it going, sleep, and review what it built in the morning — but you designed what it was supposed to build, and you decide whether it actually did.

Like a factory running overnight. The machines run without operators. But the production engineer designed the jigs, set the tolerances, wrote the quality check at the end of the line, and reviews the morning report. The automation executes the discipline the human encoded. A Ralph Loop is that factory. RUN_PLAN.md is the jig.

Embarrassing Case Study

how not to do it, so you can learn how to do it

What Happened

One Saturday morning, coding away, I had too many AI console windows open and couldn't tell which was which. So I thought to put watermarks on the background — when I switched between consoles, I'd see the context. The rest of that day was that program. By the end of it I had v1 and v2 fully coded and was already using it. I shared it around at work, made a GitHub for it, and built an installer. This is the story of that program.

When you see v1 through v8 in the graph below, keep in mind: each check-in was a coding session and a version number, and I could do multiple versions in a day. This is a real story.

The tool was built fast. v1 through v8 added platforms, features, and integrations at high velocity: Claude Code, then Codex, then Copilot, then OpenCode, then Gemini, then Kiro. A Jira integration. GitHub PR review. Cost tracking. Windows Terminal profile management. A companion maintenance utility. A build pipeline. IP-protection obfuscation. All of it in roughly 10 weekends of active development.

By v8.3.0 the product worked. It had users. It had real commercial value. It also had the accumulated debt of every shortcut taken in the name of shipping the next feature.

The v9 arc was the reckoning. Not a rewrite — a systematic repair of every pattern that had been done wrong or duplicated across files. It ran from v9.0.0 through v9.9.8, across more than a dozen focused increments, and it cost more lines than all prior development combined.

The Graph That Started This Conversation

All 98 commits aggregated by major version. Each block represents a relative change wave. v8.x excludes the obfuscated build artifact.

VersionCommitsRelative Change Size
v1.x14████
v213
v3.x7
v4.x2
v5.x7█████████
v6.x2
v7.x1
v8.x7████████
v9.x45████████████████████████████████████████

v9 is not close. Scores of change sets across 45 commits, dwarfing all prior eras combined. That bar is the architecture investment made visible.

The Architecture Tax

Project: A multi-platform AI session manager for Windows Terminal
Language: PowerShell 5.1
Timeline: 10 weekends
Commits: 98 total
The question this case study answers: What does it cost when you skip architecture on day 1?

The Specific Patterns That Were Retrofitted

Each v9 task addressed a class of problem that was avoidable on day 1:

Variant logic scattered across files. Every supported platform had its behavior hardcoded in whichever file happened to need it first. Adding a new platform required touching seven files. v9 introduced a central registry: one structure, all variant behavior, zero hardcoding elsewhere.

Raw I/O everywhere. Owned data files were read and written directly with no atomicity, no backup, no error handling. A crash mid-write meant silent data corruption. v9 introduced canonical read/write helpers that all managed files route through — the only place in the codebase allowed to touch owned data files directly.

External system mutations without transactions. Operations that modified an external system's config read it, mutated it in memory, and wrote it back directly. A failure mid-operation left the external system in a broken state invisible to the user. v9 wrapped all such mutations in a transaction pattern: read, modify in memory, write atomically, validate, restore on failure.

Entity identity with no documented precedence. A key entity's display name could be derived from multiple sources, each with its own priority. Each code path had its own inference logic. When they disagreed, the entity showed the wrong name. v9 established a single canonical function with a documented precedence order called by all display, repair, and rename code. The bug had been latent since v1.

Test infrastructure that tested the wrong code. The test suite was stitched into the same artifact as the production code. Several tests matched by string search in a way that found their own test definitions before the actual production functions. Tests were silently passing while the code they were supposed to protect had real violations. v9 fixed the matching logic, revealing six tests that had been producing false results for months.

Orphaned artifacts from failed operations. When a multi-step operation failed partway through — after some artifacts were created but before the operation completed — the partial artifacts remained permanently. Users would find ghost entries they never created. v9 introduced artifact cleanup tracking: record each creation, clean up everything created so far in the catch block before rethrowing.

Duplicate ceremony in every entry path. Multiple code paths each had their own copy of the same 15–20 step sequence. When one copy got a bug fix, the others did not. v9 extracted all shared steps into helper functions called by every entry path.

"A Rewrite Would Have Been Easier"

Looking at the graph, that thought is natural. Here is the direct answer.

What the v9 change sets actually contain: A significant portion are tests. A rewrite needs tests too — written without the knowledge of which edge cases fail in production. Many are planning documents. The largest individual commits are largely refactoring churn: the same lines deleted from one location and added to another. A function that moves between files costs double in the diff. The real net change is zero.

What a rewrite would actually cost: Five Windows Terminal integration surfaces, each with non-obvious quirks. Five platform data formats reverse-engineered from live data. All discovered knowledge. A rewrite starts from zero. Most importantly: a rewrite is done under the same pressure that created the original architecture problems. The impulse to ship features beats the impulse to do it right. That is exactly what happened in v1 through v8.

Where the observation is correct: The v9 arc was more expensive than it needed to be because it happened after the product shipped instead of before. The canonical helpers, registry-driven dispatch, atomic writes, and red-green testing that v9 established would have cost a fraction of what they cost at v9 if they had been the founding architecture. The graph is the receipt. The question it should prompt is not "should we have rewritten?" but "what would we have said on day 1?"

The Actual Cost

At standard software engineering rates ($150–200/hr senior developer), the v9 arc represents:

The same 12 rules, put in place on day 1, would have taken perhaps 2 hours to write and would have been enforced automatically by the AI assistant with every subsequent code generation. The entire v9 arc — every refactor, every bug it revealed, every test it required — would not have been necessary.

That is the cost of skipping architecture on day 1.

What the Product Looked Like After

At v9.9.8, the codebase passed a zero-violation code review against 29 mandatory architectural rules. The test suite had 640+ tests across 30 files, with named validation bundles for CI/CD integration. Adding a new AI platform required zero changes to rendering, dispatch, WT profile management, or image generation — one registry entry and the platform-specific discovery function. Session lifecycle errors left no orphaned artifacts. Windows Terminal mutations rolled back automatically on failure.

The v9 arc was not a mistake. It built a product that can grow without collapsing. The mistake was not having the architecture from the start.

The Day-1 Prompt

This is a template CLAUDE.md — a complete architectural ruleset you would paste at the start of your project, before the first line of code. Every rule comes from something that actually broke in production. Fill in your project details below and download a ready-to-use file.

Generate Your Day-1 File

Fill in what applies. Leave anything blank to keep the <placeholder> — you can fill it manually later.



> >

↓ Full template for reference — fill the <placeholders> manually if you prefer:

You are building <YourProduct> -- <one sentence description of what it does>.
Before writing any code, the following architectural rules are non-negotiable. Every
decision you make must be consistent with them. When in doubt, ask before deviating.

---

RULE 1: <YOUR VARIANT DIMENSION> REGISTRY IS THE ONLY SOURCE OF TRUTH FOR VARIANT BEHAVIOR

All <variant>-specific values live in a single central registry structure in one file.
No other file may hardcode a <variant> name, identifier, or behavior as a literal.
When you need <variant>-specific behavior, read it from the registry.
When adding a new <variant>, add one registry entry. Zero other files change.

This rule exists because <variants> will be added continuously. If variant logic is
scattered, every new <variant> is a surgery. If it is in the registry, every new
<variant> is a data entry.

---

RULE 2: ALL READS AND WRITES TO OWNED DATA GO THROUGH CANONICAL HELPERS

<YourProduct> owns several data files. Every read goes through Read-DataSafe.
Every write goes through Write-DataAtomic, which writes to a temp file and renames
atomically so a crash mid-write cannot corrupt data.

Never read or write owned data files directly anywhere else in the codebase.

External data owned by other systems may use raw reads, but any writes to external
config must still use an atomic write helper. Document the reason at each raw-read site.

---

RULE 3: ALL MUTATIONS TO <EXTERNAL SYSTEM> GO THROUGH A SERVICE LAYER

Any operation that modifies <external system state> must call a function in the
<system> service layer. No file outside that layer may read or write <external system
state> directly.

The service layer must wrap mutations in a transaction: read current state, make the
change in memory, write atomically, validate the result. On any exception, restore the
backup. This is not optional even for "simple" changes. Silent corruption is invisible
until users hit it.

---

RULE 4: <KEY ENTITY> IDENTITY HAS ONE CANONICAL PRECEDENCE ORDER, ENFORCED IN ONE PLACE

<Entity> display name resolution:
  <source A>  >  <source B>  >  <fallback>

These rules are implemented in exactly one function each. All display code, all repair
code, and all rename code calls those functions. No file infers <entity> identity
through its own logic. When this rule is violated, <entity> shows the wrong name or
links to the wrong record.

---

RULE 5: SOURCE CODE IS ORGANIZED INTO NAMESPACES. EACH NAMESPACE OWNS ITS CONCERNS.

  src/core/        -- global state, persistence helpers, <variant> registry
  src/<domain A>/ -- <domain A> lifecycle operations
  src/<domain B>/ -- <domain B> operations
  src/ui/          -- rendering, display, navigation
  src/integration/ -- external service clients
  src/testing/     -- all test code (never included in production builds)

A file in one namespace must not implement another namespace's logic. Cross-namespace
dependencies create phantom coupling: changes in one area break unrelated areas silently.

If a function's home is not obvious, it belongs in core/ as a shared helper.

---

RULE 6: THE REPO ROOT STAYS CLEAN

The repo root contains only:
  - Primary entrypoints and launchers
  - Primary compiled or stitched artifacts
  - Canonical instruction and index documents (CLAUDE.md, AGENTS.md, README.md, <docs-index>)
  - Major source and support directories

Build machinery belongs in build\ or equivalent.
Launcher wrappers belong in Install\ or equivalent.
Planning docs, one-off notes, and screenshots do not belong at the root.
Temporary files placed at the root during development must be cleaned up before commit.

---

RULE 7: DOCUMENTATION HAS DEDICATED NAMESPACES

docs/ is organized into stable subfolders. Every durable document belongs in exactly one:
  docs/<product-docs>/   -- user-facing docs and release history
  docs/<planning-docs>/  -- backlog, plans, and future ideas
  docs/<reference-docs>/ -- architecture, data models, and reference material
  docs/<process-docs>/   -- AI workflow, review standards, and comment standards
  docs/<assets>/         -- images and media used by docs

Do not place new documents loose in docs/ without a namespace.
Any durable new document must be placed in a namespace and linked from <docs-index>.

---

RULE 8: NEW TOP-LEVEL DIRECTORIES REQUIRE JUSTIFICATION

Create a new top-level directory only when all three are true:
  - The content does not fit any existing namespace
  - It represents a real subsystem or product boundary
  - Top-level placement improves clarity more than nesting would

Default to using an existing namespace. The burden of proof is on the new directory.

---

RULE 9: THE BUILD SYSTEM IS MULTI-PRODUCT FROM DAY ONE

If you ship more than one artifact from this codebase, each has a manifest listing
the source files to include in order.

When you add a source file: update all manifests or document why it belongs to only one.
When you move a function: check every manifest.
When you add a shared helper: include it in all manifests that need it.

Failing to maintain manifests silently breaks one product while the other works.

---

RULE 7: EVERY BUG FIX IS PRECEDED BY A FAILING TEST

Before fixing any bug: write a test that reproduces it and fails. Then fix the bug.
Then confirm the test passes. The test is not optional.

Bugs fixed without tests return. A test that explicitly reproduces a bug is
documentation that the bug was real, proof that the fix is correct, and insurance
that the fix stays correct.

Use string-literal test identifiers, not sequential numbers. Sequential numbers
require renumbering when tests are inserted. String literals survive reordering.

---

RULE 8: EVERY FUNCTION THAT CREATES ARTIFACTS MUST CLEAN THEM UP ON FAILURE

If a function creates any persistent artifact -- a record, a config entry, an external
system object -- and anything goes wrong before the function completes, all created
artifacts must be removed in the catch block before rethrowing.

Track artifact creation with boolean flags:
  artifactCreated = false
  ... create artifact ...
  artifactCreated = true
  ... on exception ...
  if (artifactCreated) { remove artifact }

Artifacts left behind by failed operations accumulate silently and confuse users.

---

RULE 9: EXTRACT SHARED HELPERS AT THE SECOND INSTANCE, NOT THE THIRD

The first time you write a pattern inline, that is acceptable. The second time you
write the same pattern in a different file, extract it into a shared helper first.
Do not wait for three instances.

When duplication reaches three or four instances, the fix requires touching every
instance. When caught at two, the fix is cheap.

---

RULE 10: NO EMPTY CATCH BLOCKS. NO SWALLOWED EXCEPTIONS.

Every catch block must either:
  a) Log the exception to a debug/error log, or
  b) Re-throw after performing cleanup, or
  c) Return a structured error result that the caller checks

An empty catch is a bug. Swallowed exceptions cause functions to return success when
they have failed and users to see wrong state with no indication of what went wrong.

---

RULE 11: RUNTIME COMPATIBILITY IS A HARD CONSTRAINT, NOT AN AFTERTHOUGHT

This tool runs on the user's machine. Specify your minimum runtime and test against it.
Do not use language features or library calls that require a newer runtime than your
stated minimum. New syntax that silently degrades on older runtimes is the hardest
class of bug to diagnose.

---

RULE 12: THE <KEY OPERATION> LIFECYCLE HAS ONE CANONICAL SEQUENCE

Every path that triggers <key operation> follows the same sequence of shared steps:

  1. Resolve <variant> and configuration from registry
  2. Resolve or generate <entity> identity
  3. Create or acquire required resources
  4. Execute the operation
  5. Confirm and record the result
  6. On any exception: release/remove all resources acquired in steps 3-4

These steps are implemented as shared helper functions called by all entry paths.
No entry path owns its own version. When steps are implemented multiple times, each
copy drifts. When one copy gets a fix, the others do not.

---

RULE 13: COMMENTS DESCRIBE CURRENT TRUTH, NOT HISTORY OR INVENTORY

File headers must contain: File, Namespace, Purpose, and optional Notes.
File headers describe ownership and invariants -- not a list of every function.
Function inventories in file headers are forbidden. They go stale immediately and
mislead both humans and AI about what the file actually does.

When a refactor changes a file's responsibility, update the file header in the same
commit. Not later. Not in a cleanup pass. In the same commit.

Use function-level documentation blocks (.SYNOPSIS, docstrings, JSDoc, xmldoc --
whatever your language provides) for detailed behavior. File headers describe
boundaries. Function blocks describe contracts. Neither does the other's job.

Comments must describe current truth, not historical intent. If a comment describes
something that was true in v3 but is not true now, delete it. History belongs in
git log and CHANGELOG -- not in source files.

A stale comment is worse than no comment. It will mislead the next developer -- and
it will mislead your AI, which reads comments as instructions.

---

WHY THESE RULES EXIST

Every rule above addresses a specific class of production bug or refactoring cost
that compounds over time. None are style preferences. Each rule was written because
something broke in production without it.

These rules do not slow development. They slow the first implementation of each
pattern by a few minutes. They prevent the class of incident where a single missing
cleanup call costs two engineering weeks to diagnose six months later.

Hold to them from the first commit.

Tests Are the Exoskeleton

The single most important observation about AI-developed software.

Tests aren't quality hygiene in this context — they're the only thing that gives the AI memory of what the system is supposed to do.

A human developer carries architectural intent in their head. When they change module A, they remember module B depends on it. An AI has no such memory between sessions. It only knows what it can see — and in a large codebase, it can't see everything. Tests are the exoskeleton that holds the shape of the system while the AI works inside it.

Without tests:
  • Every session starts from scratch conceptually
  • Fixes break prior fixes invisibly
  • The AI confidently builds forward on a crumbling foundation
  • "Done" means "compiles and looks right" — not "works correctly"

A codebase at 11% test coverage has no exoskeleton. Each change may be correct in isolation and wrong in the system. There's no way to know.

A production codebase with 800+ tests caught 400+ bugs before they shipped — that isn't just a quality metric, it's proof that the bugs existed and would have reached users. That's not a hypothetical benefit. That's 400 production failures that didn't happen.

Tests That Know What the Architecture Is

The corollary worth noting: the best test architecture is itself registry-aware. Static analysis tests scan for hardcoding violations — meaning the tests don't just verify output, they enforce architectural contracts. When the AI tries to shortcut the registry pattern, a test fails. The discipline is self-reinforcing in a way that no amount of code review can replicate.

There is a ceiling a codebase will never reach without foundational test coverage: not just tests, but tests that know what the architecture is supposed to be. Tests that check whether the AI followed the rules, not just whether the output looks right.

From Vibe Coding to Engineering

This website is a personal discovery journey. I was amazed how much I learned just in the Advanced: Building a Plugin section I added, and how it demystified the architecture of Claude even more. I also realized as I wrote and took the quizzes that I had been using AI incorrectly. This site was changing my view of AI.

I admit, I used to think that if I did Claude Code and made apps through vibe coding, that I was an AI engineer — the proof was in the amazing apps I wrote (with one in the Chrome store, and loads of tools written for work). Alas, having done that, and trying to fix an app that I made, I realized that I was unwittingly doing it wrong. I wasn't coding with Claude correctly, and yet I was getting great results. And by the time I got to version 5 of one of my projects, it started collapsing. I am on version 10 now, mainly refactors — leading to the creation of this Manifesto. I have spent more time refactoring that app (still broken) than I did making it. That is why I stepped back, and rethought how to avoid that trap. I had to learn how to do this right, and I am still learning.

To call myself an AI Engineer (as I did) was a false sense of power from early success with AI prompting. And I saw non-engineers having the same successes (even greater!), with no engineering discipline. However, as I made this manifesto, I realized that engineering is possible through AI tools — if the hooks, agents, skills, memory and plugins are used properly. If a person is vibe coding with good results, I want them to join me in rethinking the engineering aspects. I could have saved myself so much time if I followed the patterns learned from my career as a software engineer.

Claude Code is like a loaded gun: you can point it at an animal and have a meal, or you can shoot your own foot. Consider this site to be a gun safety course and weapons training. The quizzes are important, as they reinforce learning. I'll keep vibe coding (I didn't engineer this site, I vibe coded it), but there is a time to engineer.

In all honesty, engineering is not my goal. Immediate and powerful solutions have drawn me to AI (and Claude Code in particular). I can have a great app in 10 hours? Yes! But in that fever-pitched drive to a solution, I had a flaw. The flaw was in my thinking. I thought Claude was something it isn't. That is why I stepped back, studied Claude more closely, and wrote this site. I hope the manifesto does its work on you too. My problem was not bad prompting, but bad thinking, and I can't ignore engineering, because I needed to re-engineer my own thoughts. Ironically, our underlying subject is software that emulates human thinking, and as you correct your thoughts about AI, you are engineering the only biological brain you can change.

For more, I point you to this interactive tutorial on the early history of AI, with videos: github.com/srives/Perceptron

The New Epistemology

A philosophical retrospective on what AI programming is doing to the programmer.

This is not only faster coding. It is a change in how software is known. The dominant object is no longer the line, the file, or even the class. The dominant object is the system shape the human is trying to preserve while an AI writes inside it.

I am building up a mental model of what programming is becoming. The black CLI screen, the text prompt, the running agent inside the repository: these are not incidental surfaces. They change the way the work feels. The terminal is spare, verbal, procedural, and immediate. It is less like arranging objects in an IDE and more like addressing a machine intelligence in its own workshop.

That changes the programmer. The tool affects the thing it is designed to affect, but it also affects the practitioner. A hoe cuts the soil and raises blisters on the hand. LLM/CLI programming cuts through boilerplate and raises blisters in the mind.

At first there is pain. Then the calluses form. The callus is not numbness. It is trained sensitivity.

The Medium Works Back

McLuhan's line was that the medium is the message. The related lesson is that we shape our tools, and then our tools shape us. AI-assisted programming makes that literal. I prompt the model, but the model's habits teach me what I must become more precise about.

When I build now, I often do not see code first. I see blocks of intention: stores, boundaries, contracts, lifecycle flows, entry points, invariants, and seams. Handwritten code can feel small because it is local. The mental object is larger than the line. The line is only one visible trace of the object.

The cognitive leap is from authorship to stewardship. The AI can produce the local material. The human must preserve the system's identity across changes.

From Files to Stores

The word "file" names a physical artifact. The word "store" names a responsibility. A store is durable state with rules: where it lives, who may read it, who may mutate it, how writes are made atomic, how failure rolls back, and what shape must remain true afterward.

That shift matters because AI is often locally helpful and globally careless. It will write the direct file access if the direct file access solves the immediate problem. It will place a helper where it is convenient. It will preserve an old path "for safety." It will let two sources of truth coexist if nobody forces the question of ownership.

The new discipline is to ask, before the code appears: who owns this state? What boundary is being crossed? What is the one canonical path? What invariant must survive the edit?

Seams and Boundaries

A seam is the narrow handoff where one responsibility gives way to another. It is the place where behavior can be swapped, tested, or refactored without corrupting the rest of the system. A good seam preserves future freedom. A muddy seam destroys it.

This is why boundary bugs feel different from algorithm bugs. An algorithm bug is local. A boundary bug spreads. When canonical truth exists in two places, fixing one copy does not fix the system. When three entry points perform the same lifecycle differently, testing one path proves almost nothing about the others.

The vocabulary is part of the control system. Namespace, boundary, contract, entry point, seam, invariant, store: these words are not decoration. They are handles for detecting recurring AI failure classes before they harden into architecture.

The Mental Blisters

BlisterWhat Hurts FirstThe Callus That Forms
Loss of tactile contactThe AI wrote code I did not personally touch.Inspection through diffs, tests, logs, contracts, and invariants.
Boundary painThe feature works, but the responsibility landed in the wrong place.The reflex to ask who owns the behavior and what may cross the boundary.
Contract hungerAmbiguous returns, partial objects, silent failure, and caller guesswork.Explicit shapes, failure modes, cleanup duties, and canonical precedence.
Professional suspicionThe thing appears complete before it has proven itself.Hostile review, preflight, targeted tests, and refusal to trust plausible output.
Memory externalizationThe same hard-won lesson is lost between sessions.CLAUDE.md, AGENTS.md, memory files, plans, skills, hooks, and tests.
Token-cost consciousnessContext fills, attention thins, and old detail compacts away.Compressed rules, durable documents, named patterns, and load-on-demand procedures.

The New Unit of Thought

BeforeAfter
Line / function / fileOwner / boundary / contract / seam / invariant
Compiler error / bugDrift / duplicated truth / muddy ownership / invisible second path
Developer remembers intentTests, rules files, architecture docs, prompts, hooks, and review protocols preserve intent
Write correct codeShape a system so an AI can safely modify it

The Blind Potter

I feel like a blind potter. The clay is no longer directly under my hand. It is behind a curtain, and I feel it through delayed signals: diffs, test failures, logs, runtime behavior, and architectural drift. Yet the cup can still be made. The bowl can still be shaped.

The millions and billions of tokens are the pounds of clay wasted while learning where the form is. They are not just cost. They are apprenticeship. The waste teaches touch.

The programmer is not blind because sight is absent. The programmer is blind because the material is mediated. Mastery is learning which signals can be trusted.

The Token Apprenticeship

Tokens are not the real unit of learning, but they are a useful metaphor. The real unit is repeated, painful failure classes.

One brutal project can teach more than millions of pleasant tokens. The fast build gives power. The refactor gives epistemology.

A Practical Discipline

The next leap is not only to model what the AI builds. It is to model how the AI tends to fail.

  1. Name the failure class.
  2. Decide whether it is local or architectural.
  3. Add a rule only if the failure class will recur.
  4. Add a test if the rule can be enforced mechanically.
  5. Add a seam if future change needs protected freedom.
  6. Add memory only for facts the next session must inherit.
This is how the programmer stops being molded unconsciously and starts molding the medium back. The AI writes. The human increasingly defines the laws under which writing is allowed.