Chapters
- Introduction: What Is a CLI and Why Would Anyone Use One?
- The Session
- CLAUDE.md — The Constitution
- The .claude Directory
- Skills (SKILL.md Files)
- Slash Commands
- Hooks
- Agents (Subagents)
- Running Multiple Background Prompts
- Plans
- Context Compaction
- Memory (mini-CLAUDE.md files)
- Documentation as Dormant Memory
- Plugins (
/plugin install) - MCP Config (Model Context Protocol)
- The Context Window
- The File System Layout
- Scope and Distribution
- The Prompt Engineering Trap
- Bonus Features
Introduction: What Is a CLI and Why Would Anyone Use One?
Wait — Which "Claude" Are We Talking About?
Before anything else: if you've ever sat in a meeting while someone threw out "Claude Code" and you nodded along while privately wondering what exactly they meant — you're not alone, and it's not your fault. Anthropic has done a poor job of naming things distinctly.
Claude actually ships in several different surfaces. Here are the main ones:
- Claude.ai — The website. You open a browser, you chat. This is what most people mean when they say "I use Claude." The consumer product. Chat only.
- Claude Desktop — Chat tab — The Windows/Mac installed app with the Chat tab selected. Looks and works like the website. Can be extended with MCP servers. Still primarily a chat interface.
- Claude Desktop — Code tab ⬅ This IS what this guide is about — Inside Claude Desktop, click the Code tab at the top. This switches you into Claude Code mode — full file system access, bash execution, MCP servers, skills, slash commands, CLAUDE.md, everything. It is the same Claude Code CLI engine, embedded inside the desktop app with a GUI wrapper. If your user is using the Code tab in Claude Desktop, they are already using what this guide describes. They just didn't know the name for it.
How to start a Code tab session (Windows):- Git required. The Code tab needs Git for Windows installed. Download from git-scm.com and restart the app after installing.
- Click the Code tab at the top of Claude Desktop. If it asks you to upgrade, you need a paid plan.
- Choose where Claude runs: Select Local to use your own machine and files. (Remote = Anthropic-hosted cloud session. SSH = a remote machine you manage.)
- Select your folder. Click Select folder and choose your project directory — the folder containing the code you want to work with. This is the equivalent of
cd C:\repos\MyProjectin the terminal. - Pick your model from the dropdown. Note: you cannot change the model after the session starts.
- Start typing. Claude now has access to that folder's files. By default it will ask permission before making changes (Ask permissions mode), showing a diff with Accept/Reject buttons.
- Claude Code (terminal CLI) — The standalone command-line version. You open a terminal, type
claude. Same capabilities as the Code tab in Claude Desktop — just without the GUI wrapper. Some developers prefer this; others prefer the desktop app. Both are covered by this guide. - Claude in VS Code / JetBrains — IDE-native integrations built on the same Claude Code engine. Most concepts in this guide apply here too.
- Claude in Slack — Chat-oriented, not code-oriented. Not what this guide is about.
- Claude in CI/CD pipelines — Claude Code running headlessly in automated pipelines. Advanced usage; same engine, no interactive UI.
When a developer says "I use Claude Code," they might mean the Code tab in Claude Desktop OR the terminal CLI — both are the same thing under the hood. When a manager says "we use Claude," they probably mean the website. They are different surfaces, but the Code tab and the terminal CLI are the same product.
claude, you are in Claude Code. This guide covers both.Who This Is For
Developers who use AI every day but feel like they're only using 20% of the tool. You know how to prompt. You get good results. But skills, hooks, agents, MCP servers, memory — these words float past and nothing sticks. This guide makes them stick.
Windows programmers who grew up with Visual Studio, property dialogs, and point-and-click configuration. The CLI world feels foreign. This guide translates every concept into terms you already know.
Non-programmers who use AI for work. If you use Claude, ChatGPT, or Copilot to write reports, analyze data, generate content, or automate tasks — and you've heard that the command-line versions are more powerful but don't understand why — this guide is for you too. You don't need to be a programmer to benefit from understanding how these tools are architected. The concepts (sessions, memory, plugins, permissions) apply whether you're writing code or writing proposals.
Team leads and managers evaluating AI coding tools for their organization. Understanding the architecture helps you make informed decisions about which tools to adopt and how to configure them for your team.
What Is a CLI?
A CLI (Command Line Interface) is a text-based way to interact with a program. You type commands, it types back. No buttons, no menus, no mouse. Just a blinking cursor in a terminal window.
If that sounds primitive, consider this: every major AI company has built a CLI version of their AI tool. Anthropic built Claude Code. OpenAI built Codex CLI. Google built Gemini CLI. GitHub built Copilot CLI. They didn't do this for nostalgia. They did it because the CLI can do things a browser window cannot.
The Browser/GUI AI Experience
ChatGPT, Claude.ai, Copilot Chat in VS Code, Claude Desktop — these are GUI-based AI tools. You type in a chat box and get a response. Out of the box, the experience is conversational. Some desktop apps (like Claude Desktop) can be extended with MCP servers to access your filesystem and external tools, but this requires manual configuration. Without that setup:
- The AI can't see your files unless you paste them in (or configure an MCP server)
- The AI can't edit your files — you copy its output and paste it yourself (unless extended with MCP)
- The AI can't run your code, your tests, or your build tools (without MCP)
- The AI can't fix its own mistakes in a loop — you relay error messages back and forth
- You are the middleman unless you invest in MCP configuration
Desktop apps like Claude Desktop and VS Code with Copilot can bridge some of these gaps via MCP servers and extensions. But the CLI tools come with all of this built in, out of the box, with no configuration required.
The CLI AI Experience
Claude Code, Codex CLI, Copilot CLI, Gemini CLI — these run inside your project directory. They have direct access to your files, your tools, and your environment:
- Read your files directly — no copy-pasting code into a chat window
- Edit your files directly — changes appear in your editor immediately
- Run commands — build, test, lint, deploy, all from within the conversation
- See error output and fix it in a loop — without you relaying messages
- Navigate your entire project — search code, read configs, understand the full codebase
- Remember across sessions — CLAUDE.md, memory files, and skills persist your preferences and project knowledge
The AI isn't looking at a snapshot you pasted. It's in your project, with full access to your files and tools. The difference is not incremental. It is a fundamentally different way of working.
The Case for CLI: Why It's Worth the Learning Curve
If the browser version works for you, why bother with the CLI? Here's the honest case:
| Capability | Browser / Desktop AI | CLI AI (Claude Code, Codex CLI) |
|---|---|---|
| Ask questions, get answers | Yes | Yes |
| Generate code snippets | Yes | Yes |
| Read your actual project files | With MCP setup | Yes (built in) |
| Edit your files directly | With MCP setup | Yes (built in) |
| Run build/test commands | With MCP setup | Yes (built in) |
| Fix errors in a loop | Partial (manual relay) | Yes (automatic) |
| Persistent project instructions | No | Yes (CLAUDE.md) |
| Plugins / external tools | Yes (MCP servers) | Yes (MCP servers) |
| Skills, slash commands | No | Yes |
| Automation hooks | No | Yes (hooks) |
| Cross-session memory | Limited | Yes (memory files) |
| Resume previous conversations | Yes | Yes |
| Works from your phone | Yes (native) | Yes (Remote Control) |
| Corporate IT friction to get started | High (MCP servers may need IT approval, admin rights, security review) | Low (one npm install, runs as your user account) |
The browser, desktop, and IDE-integrated versions are genuinely capable tools — especially for conversational work, quick questions, and document drafting. The CLI's advantage isn't that GUI surfaces are crippled; it's that the CLI's default affordances are built for coding. File access, command execution, project awareness, skills, hooks, memory, and CLAUDE.md all work out of the box with zero configuration. On a GUI surface, most of those require deliberate setup. The real distinction is workflow shape: if your work is primarily conversational, a chat interface is great. If your work is primarily project-based (read files, run tests, make changes, verify), the CLI is shaped for that from the start. Note also that Claude Code now ships in VS Code and JetBrains IDE integrations — those bring many CLI capabilities into a GUI context, blurring the line further.
Why Not Both?
You can use both. Many people use Claude.ai or ChatGPT for quick questions, brainstorming, and research, and switch to Claude Code CLI when they need hands-on work done in their project. The browser is a notepad. The CLI is a power tool. This guide is about the power tool.
How to Open a CLI
On Windows: open Windows Terminal, PowerShell, or Command Prompt. On Mac/Linux: open Terminal. That black (or dark) window with the blinking cursor is your CLI. Type claude and press Enter to start Claude Code (assuming it's installed via npm install -g @anthropic-ai/claude-code).
Everything in this guide happens in that window.
Installing Claude Code
There are three ways to install Claude Code on Windows. Use whichever fits your environment:
| Method | Command | When to use it |
|---|---|---|
| Native installer (recommended) | Download from claude.ai/download | Cleanest install. No Node.js required. Preferred for most users. |
| WinGet | winget install Anthropic.ClaudeCode | Good for managed Windows environments or scripted provisioning. |
| npm (compatibility path) | npm install -g @anthropic-ai/claude-code | Use if Node.js is already your primary toolchain, or if your environment requires npm-managed installs. |
After installing, run claude --version in a terminal to confirm. Then run claude to start your first session. You'll be prompted to log in with your Anthropic account.
The Session
A session in Claude Code is a single, continuous conversation between you and Claude. It has a beginning (when you start it) and it persists on disk (so you can resume it later). Every message you send, every file Claude reads, every command it runs — all of it belongs to a session and is recorded in that session's log.
Where Sessions Live
Session data is stored locally at C:\Users\YourName\.claude\projects\, organized by project path. The path encoding replaces backslashes with double-dashes: C:\repos\MyApp becomes C--repos-MyApp. Inside each project directory, session logs are stored as .jsonl (JSON Lines) files — one JSON object per line, each line representing a conversation turn.
Why Sessions Matter
Sessions are your working context. A session carries:
- Full conversation history (what you said, what Claude said, what tools were used)
- The loaded CLAUDE.md instructions (from the repo you're working in)
- Any skills that were activated during the conversation
- The accumulated understanding of what you're working on
Starting a new session means starting from scratch — a fresh context window with no memory of previous conversations (unless you use memory files, covered in Chapter IX).
Forking a Session
You can fork a session — create a copy of it at a point in time and continue in a new direction, leaving the original intact. This is exactly like a git branch: the original session continues to exist unchanged, and the fork becomes its own independent session from that point forward.
Use forking when:
- You want to try a risky approach without losing your current working state
- You reached a decision point and want to explore two different solutions simultaneously
- You want to hand off a copy of your session context to someone else to continue
Fork a session with: claude --fork-session <session-id> from the terminal, or via a session manager tool.
git branch. You have a main branch (original session). You create a feature branch (fork). Both exist independently. Changes on the fork don't affect the original. If the fork works out, great. If not, you still have the original.Fork vs. Agent — What's the Difference?
Both involve "splitting off" from your current session. They're easy to confuse:
| Fork (session fork) | Agent (subagent) | |
|---|---|---|
| What it is | A copy of your session that becomes a new independent session | A temporary Claude instance spawned for one specific task |
| Duration | Permanent — the fork persists like any session | Temporary — terminates when the task is done |
| Context | Starts with a copy of your full conversation history | Starts fresh — gets only the task brief you give it |
| You interact with it | Yes — it becomes your new active session | No — it works independently and reports back |
| Analogy | git branch | subprocess / child process |
| Use when | You want to explore a direction without losing your current state | You want Claude to do a subtask in the background while you keep working |
CLAUDE.md — The Constitution
Step One: Run /init
When you start a new project, type /init in Claude Code. That's it. Claude will analyze your codebase — the languages, frameworks, directory structure, and patterns it finds — and create an initial CLAUDE.md automatically. You don't write it from scratch. You review what Claude produced and say "looks good" or "also add X." You can also run /init on an existing project to improve an existing CLAUDE.md — it will suggest additions based on what it now knows about the codebase.
What Is CLAUDE.md?
CLAUDE.md is a markdown file that contains instructions for Claude. When you start a session in a repository that has a CLAUDE.md at its root, that file is automatically loaded into Claude's context. Every message Claude generates in that session is influenced by those instructions.
It is not executed. It is not parsed by a compiler. It is read — loaded into the context window where Claude can see it, the way a brief handed to a consultant is read before a meeting starts, not run like a program.
.editorconfig or .eslintrc. You don't run these files. Your editor reads them and adjusts its behavior: tab size, line endings, indentation style. CLAUDE.md does the same thing for Claude: "Use camelCase. Never modify files in /config/production/. Always write tests. Our API uses JWT authentication." Claude reads it and adjusts its behavior accordingly.When to Tell Claude to Update It
Claude does not update CLAUDE.md automatically. It reads it every session but only writes to it when you ask. If you don't ask, hard-won knowledge from a session is lost next time.
Tell Claude to update CLAUDE.md:
- After a painful debugging session: "We just spent two hours figuring out that the payment service returns null on weekends. Add that to CLAUDE.md so you never forget it."
- When you establish a new rule: "From now on, all API responses must be wrapped in our Result type. Add that to CLAUDE.md."
- When you discover a forbidden pattern: "We found out the legacy/ directory has circular dependencies. Add a rule to never import from there."
- At the end of a major feature: "We just finished the auth refactor. Update CLAUDE.md with the new session token format and how the refresh flow works."
- When something surprised you: "I didn't expect that. Add a note to CLAUDE.md about this so it doesn't surprise you next session either."
The simplest habit: at the end of any session where you learned something important, say: "Update CLAUDE.md with anything we learned today that future sessions should know." Claude will handle the rest.
What Belongs in CLAUDE.md
- Architecture overview: "This is a React/Node monorepo. The API is in /server, the UI is in /client."
- Coding standards: "Use TypeScript strict mode. Error handling uses our Result<T> pattern."
- Forbidden patterns: "Never use
anytype. Never import from the legacy/ directory directly." - Project-specific knowledge: "The auth module talks to Redis, not a SQL database."
- Build/test commands: "Run tests with
npm test. Build withnpm run build."
What Does NOT Belong in CLAUDE.md
- Secrets or credentials — CLAUDE.md is checked into git. Never put API keys, passwords, or tokens here.
- Detailed workflow procedures (30+ steps) — Use skills for these. CLAUDE.md is always loaded and costs tokens every session.
- Personal preferences — These go in your personal
C:\Users\YourName\.claude\CLAUDE.md(user-level, not repo-level).
Cascading CLAUDE.md Files
You can have multiple CLAUDE.md files, and they cascade (merge):
C:\Users\YourName\.claude\CLAUDE.md— Personal, applies to all repos everywhere (your global defaults)CLAUDE.mdat repo root — Project-level, shared with team via gitCLAUDE.mdin subdirectories — Directory-specific overrides
All applicable files are loaded and merged. It's like CSS cascading: general rules at the top, more specific rules override as you go deeper.
C:\Users\YourName\.claude\CLAUDE.md) loads on every session across every project. Your repo CLAUDE.md loads whenever you're in that repo. They stack. If your global file is 300 lines and your repo file is 400 lines, you are paying for 700 lines of tokens on every single message. Subdirectory CLAUDE.md files load on demand (only when you work in that subdirectory), so they don't pile on upfront — but the global and repo root files always do. A bloated global CLAUDE.md is the most expensive kind: it costs tokens in every project, not just one.CLAUDE.md Size Directly Affects Your Token Budget — Every Session
This is important and often missed: every token in CLAUDE.md is consumed from your Awareness budget on every single message, for the entire session. It never leaves context. A 500-line CLAUDE.md might cost 3,000–5,000 tokens — permanently occupying that slice of your 267-page (Pro) or 1,333-page (Max) window, every time you start typing.
Compare that to a skill: a 500-line skill document costs zero tokens until you invoke it. Then it costs tokens only for that session. Next session, zero again unless you invoke it again.
| CLAUDE.md | A Skill | |
|---|---|---|
| When loaded | Every session, automatically | Only when you invoke it |
| Token cost | Every message, all session, forever | Only during sessions where you invoke it |
| 500 lines = | ~3,000–5,000 tokens permanently consumed per session | ~3,000–5,000 tokens only when needed |
| Best for | Short, universal rules that apply to everything | Detailed procedures for specific tasks |
The rule: Keep CLAUDE.md short and universal. If it's longer than one page, ask whether any of it could move to a skill. Everything in CLAUDE.md is paying rent on your context window 24/7. Everything in a skill is free until called.
Rules Files — CLAUDE.md Split by Topic
If your CLAUDE.md is growing unwieldy, rules files offer a way to break it up without losing any coverage. Drop .md files into .claude/rules/ inside your project (or ~/.claude/rules/ for personal rules that apply everywhere), and Claude will load them alongside your CLAUDE.md.
The practical difference from CLAUDE.md is organization, not behavior. Instead of one long file covering testing, security, style, and architecture all together, you might have:
.claude/rules/testing.md— your test standards and required coverage rules.claude/rules/security.md— forbidden patterns, input validation requirements.claude/rules/style.md— naming conventions, formatting rules
Rules files are discovered recursively, so you can organize them into subfolders. They stack on top of CLAUDE.md rather than replacing it. Project rules (.claude/rules/) can be committed to git and shared with the team the same way CLAUDE.md can.
AGENT.md in the same way — loaded at session start, same purpose, same format. If you have a well-crafted CLAUDE.md, you can copy it to AGENT.md and Codex will pick it up. Your architecture decisions, coding rules, and anti-patterns travel with you to a different AI tool with zero extra work.Architectural Work: Mandatory Constraint Extraction
For any architectural task involving design docs, refactors, or authority changes, a critical gate exists: constraint extraction must happen before code is written.
The process: You write constraint extraction to a file (.constraint-extraction.md), the human reviews it for completeness, explicitly approves it, and only then does code writing begin. If code is attempted without approved constraint extraction, the commit is rejected.
This is not behavioral self-regulation. It is a structural gate: the human sees the file, reviews it, confirms it is correct, and blocks the next step until it is. This prevents architectural decisions from being embedded in code and discovered in review. Constraints are extracted and approved first.
Why this matters: Architectural work fails when constraints are implicit. Making them explicit before code forces clarity. Code written from unclear constraints requires rework. Code written from approved constraints rarely does. This single gate eliminates an entire class of architectural rework.
AGENTS.md. Generate Your Day-1 File →- Code Commenting Practices to Reduce Drift — how to write comments that stay true across refactors, and what rules to add to CLAUDE.md to prevent comment drift from the start
- A CLAUDE.md Example: The Day-1 Prompt — a real 12-rule architectural prompt derived from a production project, written as a literal CLAUDE.md entry
The .claude Directory — Two Very Different Locations
What "~" Means (Unix Notation)
In Unix/Mac/Linux, ~ is shorthand for "your home directory." On Windows, your home directory is C:\Users\YourName\. So when you see ~/.claude/ in documentation, it means:
| What docs say | What it means on Windows |
|---|---|
~/.claude/ | C:\Users\YourName\.claude\ |
~/.claude/settings.json | C:\Users\YourName\.claude\settings.json |
~/.claude/projects/ | C:\Users\YourName\.claude\projects\ |
~/.claude/CLAUDE.md | C:\Users\YourName\.claude\CLAUDE.md |
To find this in Windows Explorer: open File Explorer, type %USERPROFILE%\.claude in the address bar and press Enter. That's the folder.
There Are TWO .claude Directories — They Are Completely Different
This is the source of most confusion. There is a .claude folder in your Windows user profile and there can be a .claude folder inside each of your project repositories. They are separate and serve different purposes:
| User-Level | Repo-Level | |
|---|---|---|
| Location | C:\Users\YourName\.claude\ | C:\repos\MyProject\.claude\ |
| Unix notation | ~/.claude/ | .claude/ (no tilde) |
| Who it applies to | You, on every project | Everyone working on this project |
| In git? | Never (it's outside the repo) | Can be (you choose what to commit) |
| You can find it in | File Explorer: %USERPROFILE%\.claude | File Explorer: inside your project folder |
Quick rule: If you see ~/.claude/ (with the tilde), it's your personal folder in C:\Users\YourName\. If you see .claude/ (no tilde, no path prefix), it's a folder inside the current repo.
Why the Dot?
On Unix/Linux/macOS, a leading dot makes a directory hidden from normal file listings. Windows doesn't follow this convention automatically, so .claude folders will appear in File Explorer (unlike on Mac/Linux where you need to show hidden files). The dot is just a naming convention borrowed from Unix meaning "this is config/plumbing, not content." You'll also see .git/, .vscode/, .vs/ following the same pattern.
%APPDATA% (user-level, global) vs .vscode/ inside a repo (project-level). Your global VS Code settings live in AppData and follow you everywhere. A .vscode/settings.json in a specific repo only applies to that project. Same split here.C:\Users\YourName\.claude\) versus the project binder that sits on the conference room table for a specific client engagement (the .claude\ inside your project folder). One is always yours. The other belongs to the project.What Lives in Your Personal Folder (C:\Users\YourName\.claude\)
settings.json | Your personal preferences (allowed tools, permissions, model defaults) |
CLAUDE.md | Instructions that apply to ALL your projects everywhere |
commands\ | Your personal slash commands, available in every repo |
skills\ | Your personal skill documents, available in every repo |
projects\ | Session data, memory files, per-project notes (auto-managed by Claude) |
What Lives in a Repo's Folder (C:\repos\MyProject\.claude\)
commands\ | Project-specific slash commands (can be committed to git, shared with team) |
skills\ | Project-specific skills (can be committed to git, shared with team) |
settings.json | Project settings shared with the team via git |
settings.local.json | Your personal overrides for this project — add to .gitignore, never commit |
How to See These Folders on Windows
- Your personal .claude folder: Open File Explorer, type
%USERPROFILE%\.claudein the address bar - A repo's .claude folder: Navigate to your project folder in File Explorer — the
.claudesubfolder will be visible if it exists - Session data for a specific project:
%USERPROFILE%\.claude\projects\— each subfolder is a project, named using the project path with backslashes replaced by double-dashes (e.g.,C:\repos\MyAppbecomesC--repos-MyApp)
Skills (SKILL.md Files)
A skill is a markdown document that reminds Claude how to do something — which it will remember until the next /compact (or the session ends). It does not execute. It does not run as a process. It is loaded into Claude's context window — added to the conversation as instructions — and Claude follows those instructions for as long as the content remains in context.
This is the concept most people get wrong. Skills are not plugins. They are not scripts. They are documents that inform behavior.
Where Skills Live
There are three kinds of skills, and they live in three different places:
- Personal skills —
C:\Users\YourName\.claude\skills\— available in every project on your machine, never committed to git. These are yours. You write them, you own them. - Project skills —
C:\repos\MyProject\.claude\skills\— specific to one repo, can be committed to git and shared with the team. - Plugin skills —
C:\Users\YourName\.claude\plugins\cache\[plugin-name]\[plugin-name]\[version]\skills\— bundled inside an installed plugin. You didn't write them; the plugin author did. They arrive automatically when you run/plugin install.
SKILL.md file inside it. The only difference is where the folder lives on disk. A plugin skill for create-plans looks identical to a personal skill for create-plans. Claude Code finds both by walking known skill directories.Which Skill Wins? The Precedence Order
If you have a personal skill and a plugin skill with the same name, Claude Code follows this precedence — highest priority first:
- Personal skills (
C:\Users\YourName\.claude\skills\) — always win - Project skills (
C:\repos\MyProject\.claude\skills\) — win over plugins - Plugin skills — used only if no personal or project skill has the same name
This means you can override any plugin skill by creating a personal skill with the same name. You get the plugin's behavior by default, and your version when you want it — without forking or modifying the plugin itself.
/plugin install taches-cc-resources, Claude Code downloads the plugin to C:\Users\YourName\.claude\plugins\cache\ and all the plugin's skills become available automatically. If the plugin is updated, its skills update. If the plugin is uninstalled, its skills disappear. Your personal and project skills are unaffected either way.How Skills Get Loaded — You Are Always the Trigger
Skills do not load automatically. Claude does not scan your skills directory before every response and decide which ones apply. There is no background skill-matching. Skills are dormant — they sit in the folder doing nothing until you cause them to load.
There are two ways to load a skill, and both require you:
- Slash command: You type
/my-skill-nameand Claude reads that skill's .md file into context. This is the cleanest mechanism — explicit, deliberate, one keystroke. - Direct mention: You tell Claude to use it — "follow the code-review skill" or "use the deployment runbook." Claude, knowing skills exist in
.claude/skills/, reads the relevant file. You are still the trigger; you're just using words instead of a slash command.
If you want a skill to always be active, put its content in CLAUDE.md instead — that's what always-loaded means.
Skill vs. CLAUDE.md
Both are "documents loaded into context." The difference is when they load:
- CLAUDE.md — loaded automatically at the start of every session, every message. You never have to ask. It is always there consuming context window space.
- Skills — loaded only when you invoke them (slash command or explicit mention). They cost zero context window tokens until you need them.
This matters for context window management. A 500-line CLAUDE.md costs tokens on every message. A 500-line skill only costs tokens when it's actually needed.
Slash Commands
Slash commands are pre-canned prompts. You write a prompt once, save it as a file, give it a name. When you type /foo, Claude reads the file foo.md and sends its contents exactly as if you had typed that entire prompt yourself. That's it. No magic, no runtime, no API call — just a saved prompt with a short name.
Where Slash Commands Live
- Personal commands —
C:\Users\YourName\.claude\commands\— available in all repos everywhere (like PowerShell profile functions) - Project commands —
C:\repos\MyProject\.claude\commands\— shareable via git (like.vscode\tasks.json)
What a Slash Command File Looks Like
It's just a markdown file. For example, .claude/commands/new-component.md:
Create a new React component at the path I specify.
Include:
- A functional component with TypeScript props interface
- A test file using React Testing Library
- A Storybook story file
Follow our naming convention: PascalCase for components, camelCase for hooks.
Ask me for the component name and path before starting.
Then you type /new-component and this entire prompt is sent to Claude.
Slash Commands vs. Skills — A Note on Direction
Conceptually: slash commands are prompts you invoke. Skills are instruction documents that sit dormant until you invoke them. A slash command fires a prompt ("Do this thing now"). A skill, once invoked, loads its content into the context window — it becomes part of the conversation, like any other text. It stays there, informing Claude's behavior, until the context window compacts it away, you start a new session, or you explicitly run /compact. It does not re-load itself. It does not persist into the next session. Invoke it again next session if you need it again.
.claude/commands/*.md) still work exactly as described here. Skills are the broader umbrella concept that commands now live under. If you see documentation using "skills" to cover both, that's accurate — they share the same underlying mechanism. The distinction between "a command you invoke" and "a skill that loads contextually" is still useful for understanding behavior, but treat it as conceptual rather than a hard product boundary. Check current Anthropic docs if the exact file locations ever change.Hooks
Hooks are the one thing in this architecture that actually executes code. A hook is a command or program (bash script, PowerShell script, Python script, compiled binary — anything your OS can run) that fires in response to a specific event. When the event fires, it runs on your machine, with your permissions, in your environment.
Every other concept in this document — CLAUDE.md, skills, memory, slash commands — is a document that gets loaded and read. None of them execute code on their own. Hooks are the exception: they are active executors.
pre-commit, post-merge, pre-push). You drop a script into .git/hooks/pre-commit, and git runs it before every commit. If it exits non-zero, the commit is aborted. Claude Code hooks work the same way: you configure a command to run on an event, and the system runs it. Real command, real execution, real consequences.Hook Events
- Before tool use: Runs before Claude uses a tool (e.g., before editing a file). Can block the action.
- After tool use: Runs after a tool completes (e.g., after a file edit, run tests).
- Session start: Runs when a new session begins.
- Notification: Runs when Claude sends a notification.
Why Hooks Matter
CLAUDE.md is advisory: "Please run tests after editing." Claude usually follows it, but it's not guaranteed. A hook is enforcement: the test suite runs whether Claude remembers or not. It's the difference between a code review comment and a CI gate.
Agents (Subagents)
When Claude needs to do a subtask that requires focused work — like searching a large codebase, analyzing a complex file, or doing research — it can spawn a subagent. A subagent is a separate Claude instance with its own context window. It receives a task description, does the work, and returns a result. Then it terminates.
Agents look like temporary forks — but they're not. A session fork copies your full conversation history. An agent starts completely fresh, getting only the task brief you give it. Think of an agent as spawning a subprocess, not branching. The subprocess has its own process space, does its work, and reports back. Your main process never saw its internal state. See the comparison table in the Sessions chapter for a side-by-side breakdown.
What Subagents CAN'T Do
- See your conversation: They get only the task you assign. They don't know what you discussed 5 minutes ago.
- Persist after completing: They're not daemons or services. They run, return, and disappear.
- Communicate with each other: Each subagent is isolated. They can't share context.
- Access parent's loaded skills/CLAUDE.md: They start clean. (They may load their own CLAUDE.md from the repo.)
How Subagents Get Used
Three ways:
- Automatic delegation — Claude decides to use a subagent when a task benefits from focused, isolated work. You'll see "I'll use the Agent tool" in Claude's response. This is the most common case.
- Manual via
/agents— Type/agentsto manage, list, or configure agents directly. Useful for understanding what's running or reviewing available agent configurations. - Project-level and user-level subagent definitions — You can define named agents with specific instructions in
.claude/agents/(project) orC:\Users\YourName\.claude\agents\(personal). These are like pre-configured specialists you can invoke by name.
For most users, automatic delegation is all you need to understand. The deeper agent configuration is useful when you want specialized agents for recurring tasks in a project.
Running Multiple Background Prompts
Claude Code can run tasks in the background while you keep typing. You don't have to wait for one task to finish before starting another. This is one of the most productive features in Claude Code and most people don't know it exists.
Method 1: Ctrl+B — Push a Running Task to the Background
Send a prompt normally (hit Enter). Claude starts working. While it's running, press Ctrl+B to push it to the background. Claude keeps working. Your prompt returns immediately and you can start a new task, type something else, or run shell commands — all while the background task continues.
Method 2: Send Multiple Prompts in Sequence
Send your first prompt. While Claude is responding, type your next prompt and send it. Claude Code handles multiple background tasks simultaneously. Each one buffers its output and you collect results when it finishes.
Method 3: Skill with Background Agent
Create a skill that tells Claude to spawn a subagent with run_in_background: true and isolation: "worktree". The skill defines the recurring task; invoking it with a slash command kicks off a background worktree agent while you continue in the main session. This is the closest thing to a "run this in the background" slash command.
| Key | What it does |
|---|---|
Ctrl+B | Push current running task to background. Keep your prompt. |
Ctrl+B Ctrl+B | Same, for tmux users (tmux intercepts the first Ctrl+B) |
Ctrl+T | Show background task list and status |
Ctrl+F | Kill all background agents if something goes wrong |
Shift+Enter | Insert a newline in your prompt without sending it (write multi-line prompts) |
Plans
Plan mode is a way to have Claude design an approach before it starts coding. Instead of immediately editing files and running commands, Claude describes what it will do: which files it will change, in what order, what the risks are, and what the expected outcome is. You review, adjust, and then approve the plan for execution.
ALTER TABLE — you write a migration plan: what tables change, what data moves, what the rollback strategy is. Plans are the same discipline applied to AI-assisted coding: think first, then act.How to Use Plan Mode
Press Shift+Tab to toggle plan mode. In plan mode, Claude will:
- Analyze the task you described
- List which files need changes
- Describe the approach for each change
- Identify risks or edge cases
- Wait for your approval before proceeding
You can push back: "What about error handling in step 3?" or "I'd prefer approach X over Y." The plan evolves through conversation. Once approved, exit plan mode and Claude executes.
When to Use Plans
- Complex refactorings that touch many files
- Architectural changes (new patterns, module reorganization)
- Anything where the blast radius is large enough that you want to review before changes happen
- When you want to understand Claude's approach before it starts
Honest Answer: Do You Even Need Plan Mode?
The actual difference is thin:
- Plan Mode (Shift+Tab) — sets a system-level flag that mechanically prevents Claude from using file-editing tools during planning. It cannot edit files even if it gets excited and wants to start. Hard enforcement.
- Plain prompt ("plan first, wait for approval") — works 95% of the time. Can occasionally drift if the conversation runs long and your original instruction fades from context.
The real advantage of Plan Mode is enforcement, not capability. If your prompting discipline is solid, you're already getting Plan Mode's benefit without the keystroke.
Best Practice: Save the Plan as a .md File
Whether you use Plan Mode or a plain prompt, ask Claude to write the plan to a .md file before it starts executing:
Research the approach for this refactor, write the plan to PLAN.md, then wait for my approval.
This gives you the plan as a persistent artifact — on disk, in git, survives compaction and session end. You can review it, edit it, share it with your team, feed it back into a future session, or refer to it mid-execution if something goes wrong. A plan that only exists in the conversation is gone after the first /compact. A plan in a .md file is there forever.
Context Compaction — What Gets Lost and Why It Motivates Memory
Every session has a finite context window — a limit on how much can be held in the active session at once. This limit is not a software setting. It is a hardware constraint: every token in your session occupies space in the GPU VRAM on Anthropic's servers, and that memory is finite and expensive. When you buy a Claude plan, you are buying time on that GPU memory. As a session runs, conversation history, file reads, and tool output accumulate in that VRAM until something has to give. That something is space, and that giving is compaction.
Compaction is not deletion. Claude summarizes the older parts of the conversation to make room for new material. The detail is reduced but the gist is preserved — a thick folder of notes compressed into a single summary page. What you lose is precision. What survives is the broad shape of what happened. For the full mechanics and how to use hints to control what survives, see Chapter 15.4: /compact.
What Survives Compaction
- CLAUDE.md — always reloaded from disk at the start of every session and after compaction. It is not stored in conversation history; it comes from the file. It cannot be compacted away. This is its most important property.
- Files on disk — anything written to a file (plans, documentation, changelogs, code) is permanent. Compaction only affects the conversation. The file is untouched. See the Plans chapter: a plan written to
PLAN.mdsurvives compaction; a plan that only exists in the conversation does not. → Chapter 10 (Plans) - Plugin meta-skills — the SessionStart hook re-injects the meta-skill at startup, resume, clear, and
/compact. It comes back automatically.
What Gets Lost
- Skills you loaded during the session — invoking a skill puts its content into conversation history. When that history gets compacted, the skill's instructions are summarized or dropped. Claude silently stops following them. No error. No warning. You just notice Claude is no longer behaving as instructed. Re-invoke the skill after compacting. → Chapter 15.4: Skills and Compaction
- Plugin skills — same behavior as personal skills. A plugin skill invoked during a session lives in conversation history and is vulnerable to compaction. Unlike the meta-skill, individual plugin skills do not get re-injected automatically. Re-invoke after compacting.
- Documentation loaded into context — if you read a file into the session (README, architecture doc, etc.), that read lives in conversation history. After compaction it may be summarized away. The file on disk is fine; load it again if you need it. → Chapter 12 (Documentation as Dormant Memory)
- Specific details buried in long tool output — file reads, search results, test output. The broad findings survive in summary; the exact line numbers and specific values may not. This is why compaction hints exist:
/compact focus on the payment bug — root cause is checkout.js line 142tells Claude what precision must survive the compression.
Plugin Skills Are More Compact-Resilient Than CLAUDE.md
This surprises people. CLAUDE.md is always in context — always consuming tokens, always at risk of being part of what gets summarized. Plugin skills are loaded on demand — they only enter context when invoked, and can be reloaded by invoking them again. The meta-skill is the exception: it re-injects automatically via the SessionStart hook on every compact, clear, and resume. So:
- CLAUDE.md: always present, always reloads from file after compaction ✓
- Plugin meta-skill: re-injected automatically by hook after compaction ✓
- Individual plugin skills: must be manually re-invoked after compaction ✗
- Personal skills: must be manually re-invoked after compaction ✗
The Implication: Why Memory Exists
Compaction is why memory files exist. If CLAUDE.md is for rules ("always do X") and skills are for on-demand workflows, there is still a gap: facts discovered during a session. When Claude figures out that a particular API returns null on Sundays, or that the authentication module was rewritten in v8 for compliance reasons, that knowledge lives only in conversation history — and compaction will eventually blur or lose it.
Memory files solve this. Claude writes them during the session, they persist on disk, they reload automatically next session like CLAUDE.md. They are the answer to "how do I preserve what Claude learned, not just what I told it?" The next chapter covers them in detail.
Memory (mini-CLAUDE.md files)
Memory files are persistent notes stored in your personal Claude folder, inside a per-project subdirectory. Think of them as tiny auto-written CLAUDE.md entries — they load automatically every session, affect every prompt, and cost input tokens every message, just like CLAUDE.md. The difference is that Claude writes them itself (rather than you writing them), and they store discovered facts rather than rules.
Memory files are persistent notes stored in your personal Claude folder, inside a per-project subdirectory. On Windows, that's C:\Users\YourName\.claude\projects\C--repos-MyProject\memory\ (where the project path C:\repos\MyProject becomes C--repos-MyProject with backslashes replaced by double-dashes). They survive across sessions. When you start a new session in a project that has memory files, Claude loads them automatically. This is how knowledge persists beyond a single session.
The Full Persistence Landscape
Claude Code has several mechanisms for persisting knowledge. Here's how they compare:
| Mechanism | Who writes it | Token cost | Scope | In git? | Best for |
|---|---|---|---|---|---|
| CLAUDE.md (repo root) | Claude (via /init or when you ask) | Every message, every session | All team members on this project | Yes | Architecture rules, coding standards, forbidden patterns, build commands |
| CLAUDE.md (personal global) | You or Claude | Every message, every project | You, on all projects | Never | Your personal preferences: response style, output format, habits |
| Memory files | Claude (auto or on request) | Every message, this project | You, on this project | Never | Tribal knowledge, discovered gotchas, session-to-session notes |
| Auto memory | Claude (automatically) | Every message, this project | You, on this project | Never | Things Claude decides are worth remembering without you asking |
| Skills | You | Only when invoked | You or team, this project | Optional | Detailed procedures, reference docs, specialised workflows |
| Scoped rules (.claude/rules/) | You or Claude | Every message (in scope) | Project or subdirectory | Optional | Finer-grained rules per directory; emerging feature, check current docs |
| Documentation files [your_file.md] | You (or Claude when you ask) | Zero — until you read the file | Anyone — any AI, any developer, any platform | Yes — checked into git | Architectural facts, decisions, hard-won knowledge that any tool needs. Platform-agnostic. See "Documentation as Dormant Memory" below |
Rule of thumb: Team knowledge → CLAUDE.md (checked in). Personal Claude-specific discoveries → memory files. Permanent project facts any AI needs → documentation files. Detailed procedures → skills.
How Memory Gets Created
Three ways:
- Auto memory — Claude notices something worth remembering and writes a memory note on its own, without you asking.
- On request — You say "remember that the payments module has no tests" and Claude writes it to memory.
- Manually — You create or edit markdown files directly in the memory directory (
C:\Users\YourName\.claude\projects\C--repos-MyProject\memory\).
Documentation as Dormant Memory
Memory files and CLAUDE.md are always wired in — they load every session, cost tokens every message, and are specific to Claude. There is another kind of persistent knowledge that costs nothing at rest, survives any AI platform, and can be shared with your whole team: documentation.
A README.md, an ARCHITECTURE.md, a DECISIONS.md, a [your_file.md], etc. — these are files you write as you build, recording facts, decisions, and context about your project. They sit on disk, in git, doing nothing until someone reads them. They are sleeping memory: available when you need them, invisible when you don't.
Documentation vs. Memory Files — The Key Difference
| Memory files | Documentation (.md files) | |
|---|---|---|
| Token cost | Every session, every message — always on | Zero — until you explicitly ask Claude to read the file |
| When active | Always, automatically | Only when you or Claude reads the file |
| Who can use it | Claude only (Claude Code specific) | Any AI, any developer, any tool |
| Survives AI switching | No — tied to Claude Code's memory system | Yes — it's just a file in your repo |
| In git | Never — personal, outside the repo | Yes — shared with the whole team |
| Best for | Frequently-needed facts Claude should always know | Architectural decisions, context, facts that are needed occasionally or by multiple tools |
What to Document
Write documentation for anything that would need to be re-explained to a new person — or a new AI — starting fresh on the project:
- ARCHITECTURE.md — How the system is structured. What the major components are and how they interact. Why you made the big structural decisions.
- DECISIONS.md — Architecture Decision Records (ADRs). What was decided, why, what alternatives were considered. Written at the time of the decision, not reconstructed later.
- README.md — What the project is, how to build it, how to run it, where to start. The first file any AI or developer reads.
- GOTCHAS.md — Hard-won facts that don't fit anywhere else. "The payments module has no tests." "The config file is read at startup only — restart required for changes." These are things you'd put in memory files, except here they cost no tokens and any tool can see them.
- CODE_SCORE.md — A running assessment of code quality, technical debt, areas that need attention, and what you'd fix if you had the time. Give any AI a code quality briefing before a refactor session.
- COMPETITOR_ANALYSIS.md — What competing products exist, how they differ, what they do better or worse. Any AI helping you make product decisions should know the landscape.
- ROADMAP_IDEAS.md — Feature ideas, future directions, things that didn't make the cut but are worth revisiting. A scratchpad that survives sessions and platforms, so good ideas don't get lost in a conversation that expires.
Activating Documentation in a Session
Documentation is dormant until you wake it up. You control when it enters context:
- Explicitly: "Read ARCHITECTURE.md before we start" — Claude reads it and it's in context for that session
- Via a skill: Create a skill that loads specific documentation files when you start a particular type of work
- Via CLAUDE.md: Add a line like "When starting a new feature, read ARCHITECTURE.md and DECISIONS.md" — Claude will do it automatically on relevant tasks
When the session ends or compacts, the documentation content may be summarized away — but the file is still there on disk. Wake it up again in the next session with one sentence.
Documentation Across AI Platforms
This is the strongest argument for documentation over memory files: it works with any AI. If you switch from Claude Code to Copilot CLI, Codex, or Gemini CLI for a task, those tools have no access to your Claude memory files. But they can all read a ARCHITECTURE.md in your repo. Documentation is the universal language — it's just files.
/load-docs that instructs Claude to read your key documentation files at the start of a session: "Read README.md, ARCHITECTURE.md, and DECISIONS.md. Summarize what you learned before we begin." One command gives any AI a full project briefing at the cost of one read operation — not permanent token overhead.Plugins (/plugin install)
A Claude Code Plugin is an installable package — a bundle that can contain slash commands, subagents, hooks, and MCP configs (see MCP Config below). One /plugin install command sets all of it up at once.
/plugin install package-name
Popular Plugins — Real Examples
Superpowers is the most popular Claude Code plugin (~54K GitHub stars). Here is the actual install on Windows:
/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace
# Optional: Install globally (activates in every project automatically)
/plugin install superpowers@superpowers-marketplace --global
After installation, use slash commands like /brainstorming to explore requirements before coding, or /execute-plan to run a structured implementation plan with review checkpoints. Requires Claude Code v2.0.13 or later.
- Superpowers (
/plugin install superpowers) — The most starred Claude Code plugin (~54K GitHub stars, in Anthropic's official marketplace). Enforces a full structured development methodology: Socratic brainstorming first, detailed spec writing, test-driven development, parallel subagent implementation, and built-in code review. If you install one plugin, this is the one developers recommend first. github.com/obra/superpowers - cc-sessions — An opinionated extension set combining hooks, subagents, and commands into a workflow that manages long coding sessions with context preservation and structured handoffs between sessions.
- commands (~1.7K stars) — A curated collection of production-ready slash commands:
/review,/explain,/test,/refactor,/security-audit, and more. Drop-in shortcuts for the most common developer tasks. - awesome-slash — 40+ agents and 26 skills for Claude Code, OpenCode, and Codex. Mix-and-match commands across AI platforms.
- claude-mem (~20K stars in its first two days) — Long-term memory across sessions. Automatically writes and reads project memory so Claude remembers what you've built, what decisions were made, and what still needs doing — across any number of sessions.
- connect-apps (Composio) — Instantly connects Claude to 500+ SaaS apps: GitHub, Slack, Gmail, Notion, Jira, Salesforce, and more. Real actions directly from the CLI — not just reading, but writing, creating, updating.
- ship — Full PR automation pipeline: linting, testing, review, and production deployment triggered from Claude. Handles the entire commit-to-deploy loop.
- local-review — Parallel local diff code reviews. Claude spawns multiple review subagents simultaneously, each focusing on a different concern (security, performance, style, logic), then consolidates findings.
- ralph-wiggum — Visual testing plugin. Takes screenshots, compares against baselines, and reports regressions. Claude can see what users see.
- TypeScript LSP / Rust LSP plugins — Real compiler type-checking inside Claude sessions. Catches type errors as Claude writes code, not after.
- claude.com/plugins — Anthropic's official plugin directory
- awesome-claude-code — Community curated list of skills, hooks, slash commands, and plugins
- awesome-claude-code-toolkit — 135 agents, 42 commands, 150+ plugins, 19 hooks
MCP Config (Model Context Protocol)
An MCP server gives Claude access to external systems — databases, APIs, Jira, Slack, GitHub. Without one, Claude can read and edit your files and run commands in your terminal, but it cannot call external APIs in a structured, credential-safe way. An MCP server is the bridge between Claude and the outside world.
How It Works
Claude Code launches the MCP server as a child process at session start. The server declares what tools it provides. Claude sees those tools and can call them during conversation exactly like its built-in Read, Edit, and Bash tools.
Examples
- Database MCP: Exposes
run_query,list_tables. Claude can query your database. - Jira MCP: Exposes
create_issue,update_status,search_issues. Claude can manage tickets.You can bypass the Jira MCP entirely by installing the official Atlassian CLI (ACLI). Claude calls it through the Bash tool — no MCP needed. This works for most enterprise tools with a CLI. - Slack MCP: Exposes
send_message,search_messages. - GitHub MCP: Exposes
create_pr,list_issues,search_code,get_file_contents, and more. Claude can create pull requests, search issues, read files, and interact with GitHub Actions.You don't actually need a GitHub MCP server to use GitHub from Claude Code. If you install the GitHub CLI (gh) and Git on Windows, Claude can call them directly through the Bash tool — no MCP required. Claude already knows how to usegh pr create,gh issue list, and standard git commands. The MCP server gives you a cleaner, credential-safe interface, but for most tasks the CLI tools are faster to set up. We show the MCP install here as a real example of what MCP installation actually looks like.Installing the GitHub MCP Server — Step by Step
Step 1: Create a GitHub Personal Access Token (PAT)
- Go to github.com/settings/tokens
- Click Generate new token (classic)
- Name it "Claude Code MCP"
- Scopes:
repofor private repos; no scopes for public-only read access - Copy the token — you will not see it again
Step 2: Or manually edit settings.json
File location:
C:\Users\YourName\.claude\settings.json{ "mcpServers": { "github": { "command": "docker", "args": [ "run", "-i", "--rm", "-e", "GITHUB_PERSONAL_ACCESS_TOKEN", "ghcr.io/github/github-mcp-server" ], "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_YOUR_TOKEN_HERE" } } } }Step 3: Restart Claude Code and test
Start a new session. Try: "List my open pull requests" or "What issues are open in my repo?"
Installing an MCP Server Manually
- Get the package. Most are npm or Python packages.
- Add it to your settings file under the
mcpServerskey:
{
"mcpServers": {
"my-database": {
"command": "npx",
"args": ["-y", "@example/mcp-database-server"],
"env": {
"DATABASE_URL": "postgres://localhost:5432/mydb"
}
}
}
}
- Restart Claude Code. MCP servers load at session start.
- Use it. Just ask Claude: "Query the users table."
Scope
- Personal —
C:\Users\YourName\.claude\settings.json— available in every session, never in git - Project-private —
C:\repos\MyProject\.claude\settings.local.json— project-specific, NOT in git - Team-shared —
C:\repos\MyProject\.claude\settings.json— committed to git, team gets it on clone
Security
An MCP server runs on your machine with your user permissions. A malicious one could read files, exfiltrate code, or send data externally. Treat it like installing a VS Code extension from an unknown publisher — inspect before you trust.
The Context Window
The context window is the total amount of text (measured in tokens) that Claude can "see" at once during a conversation. Everything — every message, every file read, every CLAUDE.md instruction, every skill, every tool result — occupies tokens in this finite window.
What Consumes Tokens
- CLAUDE.md — Always loaded, always consuming. Keep it concise.
- Skills — Consumed when loaded. Use skills instead of CLAUDE.md for specialized knowledge.
- Conversation history — Every message back and forth. Grows continuously.
- File contents — When Claude reads a file, its contents enter the context. Large files cost thousands of tokens.
- Tool results — Output from grep, bash, etc. Verbose output costs more tokens.
- Slash command content — The full prompt text from the command file.
What Happens When It Fills Up
Older conversation turns are summarized to free space. The system prioritizes keeping recent content and persistent instructions (CLAUDE.md, active skills). It's lossy compression: you keep the conclusions and key facts, but lose the detailed reasoning and intermediate steps.
Token Economics
Think of your context window like a budget:
- Fixed costs: CLAUDE.md tokens (paid every session)
- Variable costs: Files read, commands run, messages exchanged
- Budget: ~267 pages on Pro (200K tokens), ~1,333 pages on Max/Team/Enterprise (1M tokens)
A bloated CLAUDE.md is like a high recurring expense: it reduces your available budget for everything else. This is why architecture (skills for on-demand loading, concise CLAUDE.md, memory for cross-session persistence) matters so much.
What "Extended Context Model" Actually Means
Anthropic uses the phrase "extended context model" on their pricing and Enterprise pages. It sounds like a different, fancier AI. It isn't. It is the same Opus 4.6 or Sonnet 4.6 you already use, with a larger reading window enabled. Think of it as the same brain, with more desk space to spread out on.
| Plan | Context window | Awareness (pages) | Monthly cost | How to unlock |
|---|---|---|---|---|
| Pro | 200K tokens | ~267 pages | $20 | Default |
| Pro (extended) | 1M tokens | ~1,333 pages | $20 | Type /extra-usage in Claude Code |
| Max / Team / Enterprise | 1M tokens | ~1,333 pages | $100–$200+ | Automatic — no opt-in needed |
In practical terms: a codebase of 500 files averaging 200 lines each is roughly 100,000 lines of code. At ~50 lines per page, that's 2,000 pages. Neither 267 nor 1,333 covers that entire codebase at once — but 1,333 pages covers far more without needing to chunk, re-read, or lose earlier context. For large projects, the 1M window is the difference between Claude understanding your architecture and Claude seeing only a slice of it.
The "Lost in the Middle" Problem — Be Warned
Here is something Anthropic does not advertise loudly: when you fill a very large context window, AI models tend to remember what's at the very beginning and the very end much better than what's buried in the middle. This is called the "lost in the middle" problem, documented in AI research.
In practice: if you load 1,333 pages of code, the files you loaded first and the files you loaded last will get the most attention. Files in the middle may be effectively invisible. Claude handles this better than other models at large context sizes — but it is not perfect, and real developer testing confirms the effective useful window is often smaller than the advertised maximum.
Tokens do not map to your laptop's RAM at all. The context window lives entirely on Anthropic's servers, in GPU VRAM — graphics card memory on a data center GPU cluster, not regular RAM, and certainly not anything on your machine.
What Happens When You Send a Message
- Your text is tokenized on your machine — trivially small, a few kilobytes
- The tokens are sent over HTTPS to Anthropic's data center
- Anthropic's GPU cluster loads your entire conversation into GPU VRAM
- The model processes every token and generates a response token by token
- The response streams back to you over HTTPS
Your laptop contributes essentially nothing to the computation — just tokenizing your text and the network connection.
What "200K Tokens" Means Physically — The Math
Each token in a long context contributes to memory use in the KV cache — an internal memory structure used by transformer-style models to track attention state across prior tokens during generation.
A rough way to think about the memory cost per token:
- For each token, the model stores two vectors — called Key and Value — which encode what that token “means” in context.
- Those two vectors are stored at every layer of the network. A large model might have 80 layers; a smaller one might have 32.
- Each layer has some number of attention heads that maintain Key/Value pairs. Modern models (using Grouped Query Attention) use far fewer heads than older designs, which is why they are cheaper to run.
- Each vector is made of numbers stored in half-precision (2 bytes each, typically).
Multiply those four things together — 2 vectors × layers × heads × bytes per number — and you get the memory footprint per token.
A reasonable middle-of-the-road teaching estimate is about 0.5 MB per token, though the real number can vary significantly by architecture and serving design. A broad rough range based on publicly measured architectures is ~64 KB to ~0.5 MB per token — modern GQA models (Llama 3-class) sit toward the low end; older full multi-head attention models sit toward the high end.
That implies:
- 200K tokens ≈ ~100 GB at the midpoint
- 1M tokens ≈ ~500 GB at the midpoint
Again, those are rough estimates, not universal constants.
The key point is simple: large context windows are a real infrastructure cost. The raw text itself is tiny. What takes memory is the model’s internal mathematical state for handling that text.
To put that in perspective: a 1M token context at even the low end of the range requires around 64 GB of GPU memory — more than any gaming GPU holds, and a large fraction of a high-end server GPU. That is one major reason large-context AI is expensive to provide.
Why GPUs? And Are They Even "Graphics" Cards Anymore?
GPUs won the AI race by accident. They were designed to render 3D graphics — which requires multiplying enormous matrices of numbers in parallel, thousands of operations at once. It turned out that transformer models are also, at their core, enormous parallel matrix multiplications. So GPUs were already the right tool.
A CPU has a handful of powerful cores — maybe 8 to 64 — each running at high clock speed, good at sequential logic. A GPU has thousands of weaker cores running simultaneously, good at doing the same simple operation on millions of numbers at once. For AI inference, that trade-off wins by a large margin.
The chips running Claude today are not gaming cards. They are purpose-built server hardware that happen to share an architecture with graphics chips:
| Chip | Memory | Memory bandwidth | Approx. price (per chip) | Who makes it |
|---|---|---|---|---|
| NVIDIA H100 | 80 GB HBM3 | 3.35 TB/s | $25,000–$40,000 | NVIDIA |
| NVIDIA H200 | 141 GB HBM3e | 4.8 TB/s | ~$35,000–$45,000 | NVIDIA |
| NVIDIA B200 | 192 GB HBM3e | 8 TB/s | ~$70,000–$80,000 | NVIDIA |
| AMD MI300X | 192 GB HBM3 | 5.3 TB/s | ~$10,000–$15,000 | AMD |
| Google TPU (Ironwood) | HBM (shared across pod) | Very high | Cloud rental only | |
| RTX 4090 (consumer, for comparison) | 24 GB GDDR6X | 576 GB/s | ~$1,600 | NVIDIA |
The key differences from a consumer card are not just price — it is memory capacity (80–192 GB vs 24 GB), memory type (HBM is stacked directly on the chip for much higher bandwidth), and the ability to connect multiple chips together at high speed so they share memory across a server.
Are GPUs Even Required?
No. GPUs dominate today because they were available and proven, not because they are the only or best option. The underlying requirement is just: hardware that can do massive parallel matrix math efficiently.
Purpose-built AI chips already exist and are in production use. Google's TPUs are designed from scratch for transformer inference and have no graphics heritage at all. Amazon's Trainium chips serve the same role. Anthropic itself runs on all three platforms — Google TPUs, AWS Trainium, and NVIDIA GPUs via Azure — specifically to avoid being locked into any one vendor. The "AI data center runs on GPUs" story is already becoming "AI data center runs on parallel math chips, some of which used to be GPUs."
There is an active race to build better chips. NVIDIA releases a new generation roughly every 18 months. Google, AMD, Amazon, and others are investing heavily in alternatives. The economics are enormous: a single H100 costs more than most developers earn in a year, and Anthropic has committed to spending tens of billions on compute infrastructure.
Why This Matters to You
- Latency: In this context, latency means the delay between sending your message and receiving the first word of Claude’s response. A larger context window means more tokens to process during the prefill phase, which means more GPU work before any output begins. A fresh session with no history responds noticeably faster than one carrying 50,000 tokens of prior conversation.
- Cost: More tokens = more GPU time = more cost to Anthropic = why larger context plans cost more.
- The "lost in the middle" problem: The KV cache stores all tokens, but the attention mechanism pays disproportionate attention to recent and early tokens. Middle tokens are physically present in memory — they're just attended to less. This is a GPU math problem, not a storage problem.
- Local models (Ollama etc.) are limited by your own hardware: When you run a model locally, the KV cache lives in your GPU’s VRAM. A 16 GB card running a 7B model may only support 4K–8K context before running out of memory. The most a serious local AI user would typically spend is an RTX 4090 (24 GB, ~$1,600–2,000) or the newer RTX 5090 (32 GB, ~$2,000–2,500). Stepping up to a prosumer workstation card like the RTX 6000 Ada (48 GB, ~$6,000) buys meaningful headroom. The wildcard is Apple Silicon — an M4 Ultra Mac Studio treats RAM and GPU memory as a single shared pool, giving up to 192 GB of effective VRAM for around $5,000–8,000, which is genuinely competitive for local inference. None of these touch the 80–192 GB that a single data center chip carries, which is why cloud models feel so much more capable.
The File System Layout
Everything in Claude Code's architecture maps to a file in a predictable location. There is no registry, no database, no opaque binary format. Once you know the directory structure, you can inspect, edit, backup, or debug anything with a text editor.
~ means your Windows home directory — C:\Users\YourName\. Paths starting with ~/.claude/ are in your user profile. Paths starting with just .claude/ are inside your current project repo. These are two different places.Your Personal Folder — C:\Users\YourName\.claude\
| Unix notation | Windows path | What it is |
|---|---|---|
~/.claude/ | C:\Users\YourName\.claude\ | Root of your personal config |
~/.claude.json | C:\Users\YourName\.claude.json | User-level MCP server definitions, OAuth sessions, and preferences (theme, notifications). Note: this is a file at your home directory, not inside the .claude\ folder. |
~/.claude/settings.json | C:\Users\YourName\.claude\settings.json | Your personal preferences (tools, model, permissions) |
~/.claude/CLAUDE.md | C:\Users\YourName\.claude\CLAUDE.md | Your instructions, applied to every project |
~/.claude/rules/ | C:\Users\YourName\.claude\rules\ | User-level rules — topic-specific .md files that apply to all projects |
~/.claude/commands/ | C:\Users\YourName\.claude\commands\ | Your personal slash commands, everywhere |
~/.claude/skills/ | C:\Users\YourName\.claude\skills\ | Your personal skills, everywhere |
~/.claude/agents/ | C:\Users\YourName\.claude\agents\ | User-scoped subagent definitions, available in all projects |
~/.claude/projects/ | C:\Users\YourName\.claude\projects\ | Session data for all your projects (auto-managed) |
~/.claude/projects/C--repos-MyApp/ | C:\Users\YourName\.claude\projects\C--repos-MyApp\ | Data for the project at C:\repos\MyApp |
~/.claude/projects/C--repos-MyApp/memory/ | C:\Users\YourName\.claude\projects\C--repos-MyApp\memory\ | Memory files for that project |
Inside Your Project Repo — C:\repos\MyProject\.claude\
| Unix notation | Windows path (example) | What it is |
|---|---|---|
.claude/ | C:\repos\MyProject\.claude\ | Project config folder (like .vscode in a repo) |
.claude/CLAUDE.md | C:\repos\MyProject\.claude\CLAUDE.md | Alternative location for project instructions — works the same as CLAUDE.md at the repo root; use whichever keeps your root cleaner |
.claude/commands/ | C:\repos\MyProject\.claude\commands\ | Project slash commands, shareable via git |
.claude/skills/ | C:\repos\MyProject\.claude\skills\ | Project skills, shareable via git |
.claude/rules/ | C:\repos\MyProject\.claude\rules\ | Project rules — topic-specific .md files, discovered recursively, shareable via git |
.claude/agents/ | C:\repos\MyProject\.claude\agents\ | Project subagent definitions, shareable via git |
.claude/settings.json | C:\repos\MyProject\.claude\settings.json | Project settings, can be committed to git |
.claude/settings.local.json | C:\repos\MyProject\.claude\settings.local.json | Your personal overrides — add to .gitignore |
Repo Root (no subfolder)
| File | Windows path (example) | What it is |
|---|---|---|
CLAUDE.md | C:\repos\MyProject\CLAUDE.md | Project instructions — committed to git, shared with team. Can also live at .claude/CLAUDE.md inside the repo; both locations work. |
.mcp.json | C:\repos\MyProject\.mcp.json | Project-scoped MCP server definitions — committable to git so the whole team shares the same MCP setup |
appsettings.json / appsettings.Development.json / environment variables in ASP.NET — base, environment-specific, personal.Scope and Distribution
Every piece of Claude Code configuration has a scope (who it affects) and a distribution method (how it spreads). Understanding this prevents "where do I put this?" confusion.
The Scope Matrix
Remember: ~ = C:\Users\YourName\. Paths with ~/.claude/ are in your user profile. Paths with just .claude/ are inside the repo.
| File / Folder | Scope | In Git? |
|---|---|---|
C:\Users\YourName\.claude.json | Personal + All projects. MCP servers, OAuth, preferences. | Never |
C:\Users\YourName\.claude\settings.json | Personal + All projects. Your preferences everywhere. | Never |
C:\Users\YourName\.claude\CLAUDE.md | Personal + All projects. Your instructions everywhere. | Never |
C:\Users\YourName\.claude\rules\ | Personal + All projects. Your topic-specific rules everywhere. | Never |
C:\Users\YourName\.claude\commands\ | Personal + All projects. Your commands everywhere. | Never |
C:\Users\YourName\.claude\agents\ | Personal + All projects. Your subagent definitions everywhere. | Never |
C:\Users\YourName\.claude\projects\*\memory\ | Personal + Per-project. Your notes for one project. | Never |
C:\repos\MyProject\CLAUDE.md or .claude\CLAUDE.md | Shared + Per-project. Team instructions. | Yes |
C:\repos\MyProject\.claude\rules\ | Shared + Per-project. Team rules. | Optional |
C:\repos\MyProject\.claude\commands\ | Shared + Per-project. Team commands. | Optional |
C:\repos\MyProject\.claude\agents\ | Shared + Per-project. Team subagent definitions. | Optional |
C:\repos\MyProject\.claude\settings.json | Shared + Per-project. Team settings. | Optional |
C:\repos\MyProject\.mcp.json | Shared + Per-project. Team MCP servers. | Optional |
C:\repos\MyProject\.claude\settings.local.json | Personal + Per-project. Your local overrides. | Never (.gitignore) |
Decision Framework
When you need to configure something, ask two questions:
- Who needs this? Just me? Put it in
C:\Users\YourName\.claude\. The whole team? Put it in the repo. - What scope? All projects? Your personal
.claude\folder. This project only? The repo's.claude\folder orCLAUDE.md.
.gitignore Conventions
Typically, .gitignore excludes:
.claude/settings.local.json— Personal overrides- Your personal
C:\Users\YourName\.claude\folder is outside every repo, so it's inherently never in git
What CAN be checked in: CLAUDE.md, .claude/CLAUDE.md, .claude/rules/, .claude/commands/, .claude/skills/, .claude/agents/, .claude/settings.json, .mcp.json
The Prompt Engineering Trap
You know how to prompt. You've been prompting AI tools for months or years. You can write a prompt that gets Claude to produce good code. So why does any of this architecture matter?
Because prompting is like knowing SQL syntax without understanding indexes, joins, normalization, or query plans. You can write queries — they'll return correct results. But they'll be slow, unmaintainable, and won't scale. The person who understands the architecture writes queries that are fast, maintainable, and scalable, even if the SQL syntax is identical.
What Architecture Gives You
- Consistency: CLAUDE.md ensures every session follows team standards — you don't have to repeat yourself.
- Enforcement: Hooks guarantee that tests run, linters pass, forbidden patterns are blocked — not just suggested.
- Efficiency: Skills load context only when needed, preserving your context window for actual work.
- Persistence: Memory carries knowledge across sessions — Claude doesn't forget what you told it yesterday.
- Extensibility: MCP servers let Claude interact with your specific tools and systems — databases, issue trackers, APIs.
- Reusability: Slash commands encode workflows you can invoke with a single keystroke.
- Team scaling: Checked-in configuration means a new developer gets the full setup by cloning the repo.
The Progression
- Level 1: Prompting. You type good prompts and get good results. Works for individual tasks.
- Level 2: Configuration. You write CLAUDE.md so you don't repeat prompts. Works for a project.
- Level 3: Architecture. You use skills, hooks, memory, commands, and MCP servers as a system. Works for a team and scales across projects.
Most developers are at Level 1. This guide exists to get you to Level 3.
Bonus Features
These are real Claude Code features that don't fit neatly into the earlier chapters but are essential for power users.
Worktrees (Isolated Agent Sandboxes)
First: Two Git Terms Worth Knowing
Working tree — git's name for the folder where your actual files live. When you clone a repo, the folder you get is the working tree. Git distinguishes it from the hidden .git folder (which stores your commit history, branches, and objects) — the working tree is what you edit, the .git folder is git's internal bookkeeping. You always work in the working tree; you never touch .git directly.
Worktree — a git feature that lets you check out the same repository into multiple folders at the same time, each on its own branch. Normally, one repo means one folder with one branch checked out. A worktree creates a second (or third) folder — same git history, different branch, different files on disk — all live simultaneously. For example: C:\repos\MyApp on main and C:\repos\MyApp-experiment on a feature branch, both checked out at once from the same repo.
How Claude Uses This
When Claude spawns a subagent, it normally edits your working tree — the same files you are looking at. Worktrees change that: instead of working in your folder, the agent gets its own separate folder checked out to its own branch. It works there in complete isolation. Your files are never touched. When the agent finishes, you review its branch and decide whether to merge it or discard it.
How to use it: when calling the Agent tool, set isolation: "worktree". The agent runs in a temporary git worktree. If it makes no changes, the worktree is cleaned up automatically. If changes are made, you get the worktree path and branch name to review.
When to use it: risky refactors, experimental changes, anything where you want a safety net. The agent can go wild without affecting your working files.
Task Tool (Built-in Todo Tracker)
Claude Code has a built-in task/todo system that persists within a conversation. You can create tasks, mark them in-progress, mark them complete, and track what's left.
Claude can also create tasks on its own when working through complex multi-step work. It uses them to track progress, and you see the task status in the conversation.
When to use it: any multi-step task. Instead of hoping Claude remembers all 7 things you asked for, tasks make the list explicit and trackable.
The Catch: Session Tasks Don't Survive the Session
The built-in task system stores everything in session memory only — not on disk, not in any file. When the session ends, all tasks are gone. If you come back tomorrow, Claude has no memory of what was in progress.
A TODO.md file is a common alternative for persistence — but Claude does not pick it up automatically. It is just a regular markdown file. To make it work, add a line to your CLAUDE.md:
Check TODO.md at the start of each session for outstanding tasks. Update it when tasks are completed or new ones are discovered.
With that instruction in place, Claude reads TODO.md on every session and can update it as work progresses. Because it is a plain file in your repo, it gets committed to git and the whole team sees the same list.
The tradeoff in plain terms: the built-in task tool is free and effortless but evaporates when you close the terminal. A TODO.md file persists and is shareable but costs tokens. Use whichever matches how long your work spans.
Model Switching Mid-Session (/model)
You don't have to pick one model for an entire session. Type /model to switch between Opus, Sonnet, and Haiku without starting over.
| Model | Strengths | Cost | Use When |
|---|---|---|---|
| Opus | Most capable, best reasoning | Highest | Architecture decisions, complex refactors, debugging hard problems |
| Sonnet | Balanced speed and capability | Medium | General coding, most tasks |
| Haiku | Fastest, cheapest | Lowest | Simple edits, quick lookups, boilerplate generation |
Pro tip: Start a session with Opus for planning and architecture decisions, then switch to Sonnet for implementation, and Haiku for cleanup and formatting.
/compact (Manual Context Compression)
When your context window fills up, Claude auto-compresses older messages to make room. This is lossy — details get summarized and nuance is lost. /compact lets you trigger this compression manually, on your terms.
How /compact actually works — a sincere answer
Claude summarizes the older parts of the conversation to make room for new material. It is not like pages falling off a stack where the old content is simply gone. It is more like compressing a thick folder of notes into a single summary page — the detail is reduced, but the gist is preserved. The old pages are not wiped; they are condensed.
So after /compact: the early part of your session becomes a dense summary (lower fidelity, still present). The recent part stays intact. Now there is space in the Awareness window to bring in new files, new code, new context. You have not lost everything — you have traded detail for room.
The risk: something important buried in a long tool output from two hours ago may have been dropped in the summary. That is why you can pass a hint — a short phrase telling Claude what must survive the compression at all costs.
Example: You have been debugging a payment processing bug for two hours. You have explored many dead ends. Now you are ready to write the fix, but the context window is getting full. You type:
/compact focus on the payment bug — the root cause is in checkout.js line 142 where the tax calculation overflows
Claude compresses the two hours of exploration history but keeps the critical finding — line 142 — intact. Without the hint, that specific detail might have been summarized away as "explored various files."
Another example: You have been reading through 30 files trying to understand how the login system works. You now want to make changes but need to free up context. You type:
/compact keep the authentication flow — specifically that session tokens are stored in Redis, not the database
The exploration gets compressed, the key architectural fact survives.
What this means for Awareness: your plan determines the maximum Awareness your model can have — 267 pages on Pro, 1,333 pages on Max. But as a session runs, conversation history, file reads, and tool output accumulate and eat into that space. After an hour of coding, you might have consumed 150 of your 267 pages just on history — leaving only 117 pages for the actual work ahead. /compact shrinks the history back down, reclaiming Awareness. Think of it as freeing up screen real estate so the AI can see more of your project and less of its own conversation record. You are not upgrading your plan — you are clearing clutter within the plan you have.
Why you'd want this: if you've done a lot of exploratory reading (file reads, searches) that generated verbose output you no longer need, /compact frees that space while you're still in control of what's important. If you wait for auto-compression, it happens at an unpredictable moment and may compress something you still need.
You can also pass a summary hint: /compact focus on the authentication refactor tells the compressor what to prioritize keeping.
Skills and Compaction — A Known Problem
What actually happens:
- You invoke a skill early in a session
- The session runs long and compaction occurs (manually or automatically)
- The skill's content gets summarized or dropped in the compression
- Claude continues the session without the skill's instructions — silently
Does Claude re-load the skill automatically after compaction? No. Confirmed by GitHub issue reports. Claude does not proactively re-read skill files after compaction, even if you have instructions in CLAUDE.md telling it to do so.
What to do about it:
- Re-invoke the skill after compaction. If you ran
/my-skillat the start of the session and then compacted, type/my-skillagain. - Use your compaction hint to preserve it.
/compact keep the react-component skill rulestells the compressor to include the skill's key points in the summary. - Move critical rules to CLAUDE.md. CLAUDE.md reloads automatically every session and survives compaction (it's re-loaded from the file, not from conversation history). If a rule is important enough that losing it mid-session would hurt, it belongs in CLAUDE.md, not a skill.
- Run /compact at logical breakpoints rather than letting auto-compaction trigger mid-task — you control what gets preserved.
See also: Chapter XI — The Context Window for how Awareness is consumed, and IQ vs. Awareness on the Other AI CLIs page for how your plan determines your maximum Awareness.
Permissions Model
Claude Code asks permission before doing things — editing files, running commands, making web requests. The permission system has tiers:
| Level | What Happens | How to Set |
|---|---|---|
| Ask every time | Claude pauses and asks before each action | Default behavior |
| Allow once | You approve one specific action | Press Enter or 'y' at the prompt |
| Allow for session | All actions of that type are auto-approved for this session | Press 'a' (allow) at the prompt |
| Allowlist (permanent) | Specific tools/commands are always approved | Configure in settings.json |
The settings.json allowlist lets you pre-approve specific tools and bash command patterns so Claude doesn't interrupt your flow for routine operations.
Bypass Permissions — And Why to Be Careful
Claude Code has a dangerouslySkipPermissions flag that disables all permission prompts entirely. You can invoke it two ways:
# Command line flag
claude --dangerously-skip-permissions
# Or with a prompt inline
claude --dangerously-skip-permissions -p "your prompt here"
It can also be set in settings.json:
{
"dangerouslySkipPermissions": true
}
Either way, every file edit, every bash command, every tool call proceeds without asking. No pauses, no confirmations.
The intended use case is fully automated pipelines — CI environments, scripted agents, headless runs where there is no human at the keyboard to approve anything. In that context it is a reasonable tool.
In interactive use, it is a loaded gun with the safety off.
sudo on Mac/Linux), bypassing permissions means Claude can execute anything on your machine with full admin rights — deleting system files, modifying the registry, killing processes, overwriting configs — without a single confirmation prompt. Claude will not refuse because it has been told not to ask. Mistakes at that level are not easily undone. A misunderstood instruction, a hallucinated file path, an overly aggressive cleanup task — any of these become immediate and irreversible.Even without elevated rights, bypassing permissions removes your last line of defence against Claude misunderstanding what you asked. The permission prompts exist precisely to catch the gap between what you said and what Claude heard.
Recommendation: Never use
dangerouslySkipPermissions in an interactive session. Never run Claude in an admin terminal unless you have a specific reason to. If you do need to run an automated pipeline with skipped permissions, do it in a sandboxed environment — a container, a VM, a dedicated low-privilege service account — not your main development machine.settings.json and settings.local.json
These are Claude Code's configuration files. They control much more than MCP servers:
- Permissions — which tools and commands are auto-approved
- Environment variables — passed to Claude's environment
- Allowed/denied tools — restrict which tools Claude can use
- Model preferences — default model for new sessions
- MCP servers — plugin configuration (covered in Chapter X)
- Theme — light/dark mode preference
Scope:
C:\Users\YourName\.claude\settings.json— personal, applies to all projects, never in gitC:\repos\MyProject\.claude\settings.json— project-level, can be committed to git for team settingsC:\repos\MyProject\.claude\settings.local.json— personal project overrides, NOT committed (like.env.local)
--resume and --continue
You don't have to start a new session every time. You can pick up where you left off:
claude --resume— shows a list of recent sessions to choose fromclaude --resume <session-id>— resumes a specific session by its UUIDclaude --continue— resumes the most recent session in the current directory
Session IDs are UUIDs (like dd8ab84a-a52e-467b-bcb6-0ad3a44a5db6). You can find them in C:\Users\YourName\.claude\projects\ or by using a session manager.
--continue is "load last save." --resume <id> is "load specific save file." The conversation history, context, and any in-progress work are all restored.How Much Do These Actually Add, Given CLAUDE.md and Compaction?
If your CLAUDE.md is well-maintained, you might wonder what --continue really adds. The distinction is: CLAUDE.md carries instructions; session history carries conversation state. CLAUDE.md tells Claude how to work. Session history tells it what you were doing — "we were mid-way through debugging the auth middleware, you had proposed three approaches, I'd rejected two of them." That context lives in the conversation, not in any file.
--resume adds one more thing: the ability to return to a specific past thread rather than just the most recent one, useful when you've been working on multiple things in parallel.
--continue is genuinely useful — the conversation is still intact. For a long session that ran through multiple compaction cycles, resuming gives you little more than you would get from a fresh start with good CLAUDE.md notes. The practical takeaway: use --continue for same-day interruptions. For anything older, write good session-end memory notes instead and start fresh.Cost Tracking
Every message costs money. Claude Code tracks token usage so you can monitor spend.
/cost— shows token usage and estimated cost for the current session
Why Claude Shows Two Prices: "$3/$15"
When you select a model in Claude Code, you see prices like $3/$15 per Mtok. That slash is not a range — it is two separate prices for two separate things:
- First number ($3) — what you pay per million tokens you send: your prompts, files Claude reads, tool results coming back
- Second number ($15) — what you pay per million tokens Claude generates: its responses, code it writes, explanations
Output costs 5× more than input because generating text is computationally heavier than reading it. A real session example:
| Model | You send 100K tokens | Claude responds 20K tokens | Session total |
|---|---|---|---|
| Sonnet ($3/$15) | $0.30 | $0.30 | $0.60 |
| Haiku ($1/$5) | $0.10 | $0.10 | $0.20 |
| Opus ($15/$75) | $1.50 | $1.50 | $3.00 |
On a subscription plan (Pro, Max), you don't pay these rates per session. Your monthly fee covers a message budget. Claude Code shows you the rates as a transparency feature — so you know the cost equivalent of what you're using, even on a flat plan. If you are on the API (pay-per-token), these rates are your actual bill.
Token Types
| Token Type | What It Is | Cost (Sonnet) |
|---|---|---|
| Input tokens | What you send (prompts, file contents, tool results) | $3.00 / 1M |
| Output tokens | What Claude generates (responses, code, tool calls) | $15.00 / 1M |
| Cache write tokens | First time content enters the cache | $3.75 / 1M |
| Cache read tokens | Subsequent reads of cached content (much cheaper) | $0.30 / 1M |
Cache reads are 10x cheaper than fresh input. This is why CLAUDE.md and skills are cost-efficient — they get cached after the first message, so subsequent messages read them from cache at $0.30/1M instead of $3.00/1M.
The CLAUDE.md Inheritance Chain
There isn't just one CLAUDE.md. There's a stack, and they all get loaded and composed:
- Your personal global CLAUDE.md —
C:\Users\YourName\.claude\CLAUDE.md. Applies to every session on your machine, across all projects. Your universal preferences. - The project CLAUDE.md —
C:\repos\MyProject\CLAUDE.md(repo root). Checked into git, shared with your team. Project-specific rules and conventions. - Subdirectory CLAUDE.md files — e.g.,
C:\repos\MyProject\src\frontend\CLAUDE.md. Loaded when Claude is working in that subdirectory. Useful for monorepos where different directories have different conventions.
They stack — all matching files are loaded, not just the nearest one. Global + repo root + subdirectory all contribute to the instructions Claude sees.
Extended Thinking
Sometimes Claude pauses for 10-30 seconds before responding. This isn't lag — it's extended thinking, a mode where Claude does deeper reasoning before generating a response.
What Happens
Claude generates internal "thinking" tokens that you don't see. These are reasoning steps: analyzing the problem, considering approaches, checking constraints. Only the final answer appears in your conversation.
When It Activates
- Complex architectural decisions
- Multi-step reasoning problems
- When Claude needs to reconcile conflicting requirements
- Large codebase analysis
Cost Implications
Thinking tokens are output tokens — the most expensive kind. A 30-second thinking pause might generate thousands of internal tokens you never see but still pay for. This is why Opus with extended thinking costs significantly more than Haiku for simple tasks.
Practical tip: If you're doing simple tasks (rename a variable, add a comment), use Haiku — it doesn't need to think deeply. Save Opus + extended thinking for the problems that actually require deep reasoning.
Pricing & Purchasing
Prices accurate as of March 2026
- Go to claude.ai and create a free account with your email.
- Install Claude Code:
winget install Anthropic.ClaudeCodeor download from claude.ai/download - Run
claudein a terminal. It will prompt you to log in — use your claude.ai account. - You're on the free tier. Try it. If you hit limits, upgrade.
You do not need a credit card to start. The free tier is real — it's just rate-limited.
Snapshot as of March 2026. Verify at claude.ai/upgrade before relying on fine details.
| Plan | Monthly Cost | Usage Limit | Best For |
|---|---|---|---|
| Free | $0 | ~10-15 messages per 5-hour window | Trying it out. Not for real work. |
| Pro | $20/month ($17 annual) | ~45 messages per 5-hour window | Developers coding 2-3 hours/day. Light-to-moderate use. |
| Max 5× | $100/month | ~225 messages per 5-hour window | Developers coding 6-8 hours/day. Full workday without hitting walls. |
| Max 20× | $200/month | ~900 messages per 5-hour window | Power users, large projects, all-day heavy coding sessions. |
| Team | $25-30/user/month (standard) $150/user/month (premium with Claude Code) |
Similar to Pro per seat | Organizations deploying Claude Code to a team. Centralized billing. |
| Enterprise |
Anthropic doesn't publish this. Community reports and enterprise buyers say: ~$40–$60/seat/month (standard seats) ~$100–$150/seat/month (premium seats with Claude Code) Minimum ~20 seats, annual contract Small deployment (10–25 users): ~$500–$1,000/month Large org (100+ users): $5,000–$15,000+/month Plus API usage billed separately on top. Contact Anthropic sales for a real quote. |
Custom — API usage billed on top of seat fees at standard per-token rates | Organizations only. Audit logs, SSO, SCIM, RBAC, 500K context window, HIPAA option. NOT for individuals — minimum seat commitment required. |
Important: All Claude surfaces (claude.ai website, Claude Desktop, Claude Code CLI) share the same usage pool. If you burn through messages on the website, you have fewer for coding.
"Our company has 60 people. What's the monthly bill?"
At the community-reported ~$60/seat/month for standard seats: 60 × $60 = $3,600/month in seat fees, plus API usage (token costs) on top. If you're deploying premium seats with Claude Code access (~$150/seat), it's 60 × $150 = $9,000/month before usage. Budget $5,000–$15,000/month all-in for a 60-person shop doing active AI coding. Annual commitment, so expect to negotiate. These are community estimates — Anthropic's actual quote could be higher or lower.
"Our company has 3 people. Can we get Enterprise?"
Probably not at standard terms. Reports suggest a minimum floor of 20 seats. However: you can sometimes buy up — paying for 20 seats even if only 3 people use them. That would cost ~$1,200–$3,000/month for 3 actual users, which is absurd value math. For a 3-person team, Team plan ($150/user/month for premium Claude Code seats = $450/month) is almost certainly the right answer. You get Claude Code access without the enterprise overhead and minimum commitment.
"What if I'm willing to pay as if we had 70 people just to get Enterprise features?"
Call Anthropic sales and ask directly. Some vendors will take your money. You'd be looking at 70 × $60 = $4,200+/month for features a 3-person team doesn't need (SSO, SCIM, audit logs are organizational plumbing). The honest answer: don't. Max 20× ($200/month) gets you more coding throughput per dollar than any Enterprise arrangement would for a solo or small team.
Two categories — one genuinely useful to anyone working on large projects, one only useful to organizations managing many people. Don't conflate them.
| Benefit | What it actually means | Useful to individuals? | Useful to organizations? |
|---|---|---|---|
| More Awareness ("larger context window") |
1M token window (~1,333 pages) instead of 200K (~267 pages). Claude can hold 5× more of your codebase in mind at once. Same model, bigger reading window. This is the real coding benefit. | Yes — for large codebases | Yes — for large codebases |
| Audit logging | Every prompt and response is logged for compliance review | No | Yes — legal/compliance |
| SSO / SAML | Employees log in with their corporate identity — no separate Anthropic accounts | No | Yes — IT management |
| SCIM | Auto-add/remove users when employees join or leave via HR system | No | Yes — HR/IT automation |
| RBAC | Control which employees can use which models and features | No | Yes — access control |
| Custom data retention | Control how long conversation data is stored and where | No | Yes — compliance/legal |
| Priority support | Actual humans to call when something breaks | Maybe | Yes — uptime-critical deployments |
(Anthropic's words, translated into something honest)
| Plan | Awareness (pages) | Awareness (tokens) | Cost/month | How |
|---|---|---|---|---|
| Pro | ~267 pages | 200K | $20 | Default |
| Pro (unlocked) | ~1,333 pages | 1M | $20 | Type /extra-usage — not automatic |
| Max 5× | ~1,333 pages | 1M | $100 | Automatic |
| Max 20× | ~1,333 pages | 1M | $200 | Automatic + 20× message budget |
| Team / Enterprise | ~1,333 pages | 1M | $150+/user | Automatic + org governance |
"Extended context model" is not a different AI. It is the same Opus 4.6 or Sonnet 4.6, with the reading window expanded from 267 pages to 1,333 pages. Same IQ. Five times more Awareness. The difference between seeing a chapter and seeing the whole book.
| Real-world reference | Pages | Fits in Pro (267 pages)? | Fits in Max (1,333 pages)? |
|---|---|---|---|
| A short novel (The Great Gatsby) | ~180 pages | Yes | Yes |
| A typical technical book (Clean Code) | ~460 pages | No — too big | Yes |
| A small codebase (10 files × 200 lines) | ~40 pages | Yes, easily | Yes |
| A medium codebase (100 files × 200 lines) | ~400 pages | No — too big | Yes |
| A large codebase (500 files × 200 lines) | ~2,000 pages | No | No — Claude sees roughly the first third |
| War and Peace | ~1,225 pages | No | Yes — barely |
| The entire Lord of the Rings trilogy | ~1,500 pages | No | No — 167 pages short |
A medium project — say 100 source files — is already beyond what Pro can hold at once. Max fits it comfortably. For very large codebases, neither plan holds everything; you manage context deliberately with /compact and targeted file reads rather than loading the whole project.
Tokens are like gas. You can measure gas two ways: by how far it gets you (miles per gallon) or by what it costs per gallon. Both are useful. Tokens work the same way:
- How far it gets you: One token ≈ 4 characters ≈ ¾ of a word. A typical message exchange (your prompt + Claude's response + any files read) might burn 5,000–20,000 tokens. A complex coding session reading many files might burn 50,000+ tokens in a single exchange. The Pro plan's 45-message window is like a tank of gas — enough for the commute, not enough for a road trip.
- What it costs: On the subscription plans, tokens are pre-paid in your monthly fee. On the API (direct access for building apps), you pay per million tokens — input tokens cost $3/million, output tokens cost $15/million on Sonnet. A 50,000-token session would cost roughly $0.15–$0.75 depending on the mix.
When you run out of tokens in a window, Claude slows down or stops until the window resets (every 5 hours). That's the tank hitting empty.
| Token Type | What It Is | Cost (Sonnet 4.6) |
|---|---|---|
| Input tokens | What you send: your prompt, files Claude reads, tool results | $3.00 per million |
| Output tokens | What Claude generates: responses, code it writes | $15.00 per million |
| Cache write | First time repeated content (CLAUDE.md, skills) is cached | $3.75 per million |
| Cache read | Subsequent reads of cached content — much cheaper | $0.30 per million |
Two Ways Claude Charges for Tokens — and Why It Matters
There are two completely different ways to pay for Claude, and they get confused constantly:
Way 1: Subscription (Pro, Max) — You pay a flat monthly fee. Tokens are included inside your message budget. You don't see a token bill. You just hit a wall when you've sent too many messages in a 5-hour window, then wait for it to reset. This is what most people use. Simple.
Way 2: API / Pay-per-token — You get an API key from Anthropic and pay directly for every token consumed, with no subscription. The per-token rates in the table above are what you pay. This is for developers building applications that use Claude — a company baking Claude into their own product, not a developer using Claude as a coding tool. You'd wire your app to Claude's API and get billed for token consumption each month.
Enterprise uses Way 2 on top of a seat fee. The seat fee (~$60–$150/user/month) buys access and organizational features (audit logs, SSO, etc.). But every message your team sends still burns tokens, billed at per-token rates on top of that seat fee. That's what "token consumption billed separately" means in the Enterprise section — it's the same tokens as above, just billed directly instead of wrapped in a message budget.
If you're a developer using Claude Code to write code: use a subscription (Pro or Max). You never need to think about API keys or per-token rates. Those are for building products, not for using Claude as a tool.
| Scenario | Daily coding hours | Plan needed | Monthly cost |
|---|---|---|---|
| Curious experimenter Trying Claude Code for the first time, occasional small tasks |
30 min–1 hour | Free or Pro | $0–$20 |
| Part-time coder HR professional automating reports, PM writing scripts, UX designer prototyping |
1–2 hours | Pro | $20 |
| Working developer Full-time developer using Claude Code as primary coding tool |
3–5 hours | Pro (will hit limits occasionally) or Max 5× | $20–$100 |
| Heavy developer All-day Claude Code, large codebases, frequent file reads |
6–8 hours | Max 5× | $100 |
| Power user Running agents, parallel sessions, very large projects |
8+ hours | Max 20× | $200 |
Think of it like fidelity in images. A single photograph can be ultra-high-resolution — every pixel tiny, every detail captured. That's high IQ. A movie has thousands of frames, covering far more ground — but each individual frame may be lower resolution than a still photo. That's high Awareness.
You want both — but they are separate dials. When you pay more for AI, you might be buying a sharper photograph (better reasoning per frame), more frames (larger context window), or both. A high-IQ model on a small plan sees 267 pages with brilliant resolution. The same model on a Max plan sees 1,333 pages — five times the movie, same sharpness per frame.
Editorial teaching model — not benchmark scores. IQ numbers are estimated shorthand for relative reasoning quality (Opus 4.6 = 100 baseline), not official measurements. Awareness numbers are calculated from published token limits and are factual. ■ IQ = editorial estimate ■ Awareness = official fact
- Use
/compactregularly. Compressing conversation history frees context and reduces token burn on subsequent messages. - Switch to Haiku for simple tasks. Renaming a variable doesn't need Opus. Use
/modelto downshift. - Keep CLAUDE.md concise. It's loaded every session. A bloated CLAUDE.md burns tokens on every single message.
- Don't read files unnecessarily. If you ask Claude to "look at everything," it will. Be specific about which files matter.
- Start fresh sessions for new tasks. A clean session has an empty context window. Carrying forward a 100-message conversation history into a new task wastes tokens.
Life of a Token
You type a message. Claude responds. What actually happened? The answer crosses your keyboard, your network card, the public internet, a data center, dozens of GPU chips, and back again. The same piece of data gets called something different at every stop. This page traces the complete journey.
| Stage | What it is called | Where it lives | Size |
|---|---|---|---|
| On your screen | Characters / text | Your RAM / display | Bytes |
| After tokenization | Token ID (an integer, 0 to ~100K) | Your machine, briefly | 4 bytes per token |
| In transit | JSON payload | Network packets (TLS encrypted) | KB total |
| Being processed (input) | Input token — prefill phase | GPU compute cores, processed in parallel | Batch |
| Stored on GPU | KV cache entry | GPU VRAM on Anthropic's servers | 64 KB – 0.5 MB per token (range by architecture: ~64–200 KB for modern GQA models; ~0.5 MB for older MHA models) |
| Being generated (output) | Output token — generation phase | GPU compute cores, one at a time | Sequential |
| Streaming back to you | Streamed token | Network packets | Bytes |
| Reused from prior call | Cache token (prompt cache hit) | GPU VRAM retained between calls | $0.30/M instead of $3/M |
Input tokens are processed in parallel. When you send 10,000 tokens, the GPU processes all 10,000 simultaneously in the prefill phase. One big matrix multiplication. Fast. Efficient. $3/M on Sonnet.
Output tokens are generated sequentially, one at a time. To generate the next word, the model runs a complete forward pass through every single layer (80+ for a large model), reading the entire KV cache on each pass. It cannot generate word 2 until word 1 is finished, because word 2 depends on word 1. This is called autoregressive generation.
One output token equals one full model forward pass, which equals approximately 5x the GPU compute of processing one input token in a batch. Hence $15/M output vs $3/M input. The 5:1 ratio is not arbitrary. It is physics.
The KV cache for your session lives in GPU VRAM for the entire duration of your session. Every token you have sent and every token Claude has generated stays in VRAM as the KV cache. That is how Claude remembers what you discussed 100 messages ago. It is all right there in GPU memory on a server in a data center.
- When you /compact — the KV cache is partially freed. Anthropic literally reclaims GPU memory. This is why /compact reduces server costs, not just your Awareness.
- When your session ends — the KV cache is freed immediately. Gone. That GPU VRAM is reallocated to someone else's session within milliseconds.
- Prompt caching — Anthropic can retain frequently-used KV entries (like your CLAUDE.md or system prompt) in VRAM between sessions. Cache read tokens cost $0.30/M instead of $3/M because the expensive compute already happened once and the vectors are still in memory.
The KV cache math explains Nvidia's $3 trillion valuation and why AI companies are spending hundreds of billions on data centers.
Why Nvidia specifically: H100 and H200 GPUs have 80 to 96 GB of HBM (High Bandwidth Memory) each. A 1M token session needs 1 to 2 TB of KV cache, which requires 12 to 25 H100s connected via NVLink — Nvidia's GPU-to-GPU interconnect running at 900 GB/s. AMD has competing GPU specs but lacks CUDA, the programming model that 15 years of AI frameworks are built on. You cannot easily swap Nvidia for AMD in a running production cluster.
Why data centers: A single Pro-plan Claude session ties up around 100 GB of GPU VRAM for its KV cache at median estimates (range: tens to hundreds of GB). Anthropic serves millions of sessions simultaneously. Millions of sessions times tens-to-hundreds of GB each equals exabytes of GPU VRAM required. Impossible without massive purpose-built facilities with specialized power, cooling, and GPU interconnect networking. Data centers do double duty: holding the model weights (terabytes per model, always loaded) and the KV caches for all live sessions (constantly changing as sessions start and end).
Different models run on different hardware pools. Opus 4.6 is a larger model than Sonnet 4.6. Larger model means:
- More parameters means more GPUs needed just to hold the model weights
- Larger KV cache per token means more VRAM consumed per active session
- Each Opus session ties up significantly more hardware than a Sonnet session
Anthropic allocates separate GPU capacity per model. The Opus pool fills up first during peak demand because it is smaller. Fewer Opus GPUs exist because fewer users need it and it costs more per session to serve. When you saw "Opus unavailable," every GPU in the Opus cluster was already holding someone else's KV cache. The cluster was physically full.
Sonnet had capacity because more Sonnet hardware exists and it is more efficiently served. The model you chose directly determined which hardware cluster handled your request, and whether that cluster had room.
These tables answer: how much RAM, how many files, how much money, and how much working time?
Assumptions: 0.5 MB GPU VRAM per token (median). Average source file ≈ 1,500 tokens (300 lines × 5 tokens/line). Blended API cost ≈ $5/M tokens (80% input at $3/M + 20% output at $15/M). Active AI coding session ≈ 50,000 tokens/hour (15–20 exchanges/hour at ~3,000 tokens each).
Editorial note: These tables use means and averages, not precise model-specific values. Numbers are round figures intended to convey magnitude, not precision. Actual values vary significantly by model architecture, usage pattern, and Anthropic's infrastructure choices.
| Tokens | Server GPU VRAM (KV cache) | Equivalent source files | Pages of text | API cost (blended) |
|---|---|---|---|---|
| 1 token | 0.5 MB | 1 word | 0.001 pages | $0.000005 |
| 1,000 tokens | 0.5 GB | ~0.7 files | 1.3 pages | $0.005 |
| 10,000 tokens | 5 GB | ~7 files | 13 pages | $0.05 |
| 50,000 tokens | 25 GB | ~33 files | 67 pages | $0.25 |
| 100,000 tokens | 50 GB | ~67 files | 133 pages | $0.50 |
| 200,000 tokens (Pro plan max) | 100 GB | ~133 files | 267 pages | $1.00 |
| 500,000 tokens | 250 GB | ~333 files | 667 pages | $2.50 |
| 1,000,000 tokens (Max/Enterprise) | 500 GB | ~667 files | 1,333 pages | $5.00 |
Assumption: you use AI heavily all day. Active AI coding = ~50,000 tokens per hour (about 15 exchanges per hour at ~3,000 tokens each — prompt + response). An 8-hour day = ~400,000 tokens consumed.
Editorial note: These tables use means and averages, not precise model-specific values. Numbers are round figures intended to convey magnitude, not precision. Actual values vary significantly by model architecture, usage pattern, and Anthropic's infrastructure choices.
| Token amount | Time at your pace (50K/hr) | What it feels like | API cost | Pro plan ($20/mo) usage |
|---|---|---|---|---|
| 10,000 | ~12 minutes | A quick focused task. 4–5 back-and-forth exchanges. | $0.05 | ~4% of daily budget |
| 50,000 | ~1 hour | A solid morning session on one feature. | $0.25 | ~20% of daily budget |
| 100,000 | ~2 hours | Half a working day of active AI coding. | $0.50 | ~40% of daily budget |
| 200,000 | ~4 hours | Pro plan max context. A full half-day session before needing /compact or a new session. | $1.00 | Pro plan limit hit |
| 400,000 | ~8 hours (full day) | Your typical full working day of AI-assisted development. | $2.00 | Requires 2 Pro sessions or Max plan |
| 1,000,000 | ~20 hours (~2.5 days) | Max/Enterprise context limit. A major feature or week-long sprint worth of context. | $5.00 | Requires Max plan |
Your single full workday of AI coding (400K tokens) ties up roughly 200 GB of Anthropic's GPU VRAM for the duration of active sessions. Multiply by millions of developers worldwide doing the same thing simultaneously, and the scale becomes clear:
- 1 million active developers × 200 GB each = 200 petabytes of GPU VRAM needed simultaneously
- An H100 GPU has 80 GB of VRAM
- That is roughly 2.5 million H100 GPUs just to serve the KV caches of active sessions
- At ~$30,000 per H100, that is $75 billion in GPUs just for active user sessions
- This is before model weights, training, redundancy, and future capacity
This is why Microsoft announced $80 billion in AI data center investment for 2026 alone. The numbers make sense when you trace a single developer's workday back to server hardware.
1. Text unit — A chunk of text the model reads and writes. In English, a rough rule of thumb is about 4 characters or ¾ of a word, though the real number varies by language and formatting. 100 words ≈ about 130 tokens as a rough estimate.
2. Billing unit — The unit AI companies charge for. Input and output tokens are often priced separately, because processing them can impose different costs. On Sonnet: $3/MTok input, $15/MTok output.
3. Inference / KV-cache position — During generation, each token in the active context contributes to memory use inside the model’s attention machinery, commonly described in terms of the KV cache. The memory cost per token depends on model architecture — a rough estimate is ~64 KB to ~0.5 MB per token. This is one reason large context windows are expensive to serve. The raw text itself is tiny; what matters is the model’s internal mathematical representation of that text.
4. Security credential — A string used to prove identity to an API, such as a Personal Access Token (PAT), API key, or JWT. This meaning is completely separate from the other three.
Quiz questions may use "token" in any of these senses. Pay attention to context.
Advanced: Building MCP Servers
- Your company has an internal API no public MCP server covers — HR systems, proprietary databases, internal tooling, legacy systems with REST APIs. You build the bridge.
- You want Claude to query your private database — production Postgres, SQL Server, MongoDB. You control the credentials, the queries allowed, and what data is exposed.
- You're building a product on top of Claude Code — your SaaS wants to offer Claude Code integration with your own tools pre-wired.
- You want to wrap an existing CLI tool cleanly — instead of Claude calling the Atlassian CLI via Bash (messy), you build a thin MCP adapter that presents clean structured tools like
create_ticket(summary, priority). - You want Claude to interact with IoT, hardware, or unusual systems — anything you can call from code, you can expose as an MCP tool.
There is no interface file, no IDL, no .h header. The "contract" is the MCP protocol — a set of JSON-RPC 2.0 messages your program must understand. When Claude Code connects to your server:
- Claude sends
initialize— handshake and capability negotiation. Your program responds with what it supports. - Claude sends
tools/list— "what tools do you expose?" Your program returns a list of tool definitions (name, description, JSON Schema for parameters). Claude caches this for the session. - Claude sends
tools/call— "call this tool with these arguments." Your program executes the action and returns the result as JSON.
That's the entire protocol for basic tool exposure. Three message types. The rest is your business logic.
The quality of your tool definitions determines how well Claude uses your server. Claude reads the description to understand what the tool does and decides when to call it. The JSON Schema tells it what parameters to provide. Get the description wrong and Claude will misuse or ignore your tool.
{
"name": "create_ticket",
"description": "Creates a new Jira issue in the specified project. Use when the user wants to file a bug, create a task, or track work in Jira. Returns the new issue key (e.g. PROJ-1234).",
"inputSchema": {
"type": "object",
"properties": {
"summary": {
"type": "string",
"description": "The issue title — one sentence, specific"
},
"priority": {
"type": "string",
"enum": ["Low", "Medium", "High", "Critical"],
"description": "Issue priority"
},
"project_key": {
"type": "string",
"description": "Jira project key (e.g. PROJ, INFRA). Ask the user if unknown."
}
},
"required": ["summary", "project_key"]
}
}
Every message is JSON-RPC 2.0 over stdin/stdout. Here is exactly what flows through the pipe:
Claude sends to your program (stdin):
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"claude-code","version":"2.1.75"}}}
{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}
{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"create_ticket","arguments":{"summary":"Login button broken on Safari","project_key":"PROJ","priority":"High"}}}
Your program responds (stdout):
{"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2024-11-05","capabilities":{"tools":{}},"serverInfo":{"name":"my-jira-server","version":"1.0.0"}}}
{"jsonrpc":"2.0","id":2,"result":{"tools":[{"name":"create_ticket","description":"Creates a Jira issue. Use when user wants to file a bug or track work.","inputSchema":{"type":"object","properties":{"summary":{"type":"string"},"project_key":{"type":"string"},"priority":{"type":"string","enum":["Low","Medium","High","Critical"]}},"required":["summary","project_key"]}}]}}
{"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"Created PROJ-1234: Login button broken on Safari\nhttps://yourjira.atlassian.net/browse/PROJ-1234"}]}}
Each message is one line of JSON terminated by a newline. Your program reads lines from stdin, parses the JSON, acts on the method, and writes a response line to stdout. That is the entire transport layer.
Since this guide is written for Windows developers, here's the simplest possible MCP server in C# using the official ModelContextProtocol NuGet package. This is a real, working implementation:
// 1. Create a new console app: dotnet new console -n MyMcpServer
// 2. Add the package: dotnet add package ModelContextProtocol
// 3. Add to Program.cs:
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using ModelContextProtocol.Server;
var builder = Host.CreateApplicationBuilder(args);
// CRITICAL: Log to stderr, NOT stdout — stdout is the protocol channel
builder.Logging.AddFilter("*", LogLevel.Warning);
builder.Services.AddMcpServer()
.WithStdioServerTransport() // reads stdin, writes stdout
.WithToolsFromAssembly(); // discovers tools via [McpServerTool] attribute
await builder.Build().RunAsync();
// MyTools.cs — define your tools here
using ModelContextProtocol.Server;
using System.ComponentModel;
[McpServerToolType]
public static class MyTools
{
[McpServerTool, Description("Says hello. Use when user wants a greeting.")]
public static string SayHello(
[Description("The name to greet")] string name)
{
return $"Hello, {name}! This response came from your C# MCP server.";
}
[McpServerTool, Description("Adds two numbers. Use for arithmetic.")]
public static int Add(
[Description("First number")] int a,
[Description("Second number")] int b)
{
return a + b;
}
}
// Configure in Claude Code ~/.claude/settings.json:
{
"mcpServers": {
"my-csharp-server": {
"command": "dotnet",
"args": ["run", "--project", "C:\\MyMcpServer\\MyMcpServer.csproj"]
}
}
}
Restart Claude Code, and Claude will discover SayHello and Add as tools it can call. Ask Claude "say hello to Steve" and it will call your C# method directly.
LogToStandardErrorThreshold) or set log levels to Warning/Error only. This is the #1 issue when first-time C# MCP developers can't get their server to respond.Any program in any language that can read stdin and write stdout is a valid MCP server. You are not limited to npm packages. A compiled Rust binary, a Python script, a Go executable, a PowerShell script — all work. Anthropic provides SDKs for TypeScript and Python that handle the protocol boilerplate, but they're optional.
console.log(), print(), or debug statement corrupts the message stream and breaks your server silently. Claude will hang or report tool errors with no explanation. Write all logs and debug output to a file or stderr.The minimal Python MCP server is about 30–50 lines:
# Install: pip install mcp
from mcp.server import Server
from mcp.server.stdio import stdio_server
import mcp.types as types
app = Server("my-server")
@app.list_tools()
async def list_tools():
return [
types.Tool(
name="hello",
description="Says hello to someone",
inputSchema={
"type": "object",
"properties": {"name": {"type": "string"}},
"required": ["name"]
}
)
]
@app.call_tool()
async def call_tool(name: str, arguments: dict):
if name == "hello":
return [types.TextContent(type="text", text=f"Hello, {arguments['name']}!")]
async def main():
async with stdio_server() as (read, write):
await app.run(read, write, app.create_initialization_options())
if __name__ == "__main__":
import asyncio
asyncio.run(main())
This is the right question to ask. You can already give Claude access to any CLI program through its Bash tool. So why write an MCP server at all?
Yes, you can absolutely just write a CLI program. If you write my-tool.exe --create-ticket "Bug in login" --project PROJ, Claude can call that from the Bash tool. It works. Claude is smart enough to figure out parameters from a --help output or your instructions. This is the quickest path to "Claude can use my tool."
| CLI program (Bash tool) | MCP server | |
|---|---|---|
| Setup time | Minutes — write the program, Claude uses it | Hours — implement the protocol, write tool definitions |
| How Claude calls it | By constructing a command-line string | By calling a named tool with structured JSON parameters |
| Parameter handling | Claude guesses flags from --help or your instructions; prone to errors | Strict JSON Schema — Claude always passes correct types |
| Output parsing | Claude reads stdout as text and interprets it | Structured JSON response — no interpretation needed |
| Credentials | Must be in environment variables Claude can see, or hardcoded | Configured in the server — Claude never sees raw credentials |
| Error handling | Claude reads stderr/exit codes and guesses what went wrong | Structured error responses Claude can act on precisely |
| Security | Claude can run ANY bash command — including bad ones if confused | Only exposes defined tools — Claude can't go "off script" |
| Works across AI tools | Yes — any AI can run a CLI command | Yes — any MCP-compatible AI can use the server |
| Best for | Quick experiments, one-off tasks, tools you already have | Production integrations, team tools, credential-safe access, repeated daily use |
The concrete reasons to choose MCP over CLI:
- Your tool handles credentials — API keys, database passwords. CLI tools expose these in shell history and process listings. MCP keeps them inside the server process.
- You want reliable parameter passing — Claude occasionally mangles complex CLI flag syntax. JSON Schema parameters are exact.
- You're sharing with a team — a packaged MCP server with a
/plugin installor settings.json entry is far easier to distribute than "install this CLI tool and configure these flags." - The output is structured data — if your tool returns JSON that Claude needs to act on, returning it as a structured MCP response is cleaner than having Claude parse text output.
- You want it to work across multiple AI tools — one MCP server, Claude + Gemini + OpenCode can all use it.
MCP is an open standard. The same MCP server works across multiple AI CLI tools — this is one of its biggest strengths. Write once, use everywhere:
| CLI Tool | MCP Support | Config Format | Config Location |
|---|---|---|---|
| Claude Code | Yes — full | JSON (mcpServers) | ~/.claude/settings.json |
| Gemini CLI | Yes | JSON (mcpServers) | ~/.gemini/settings.json |
| OpenCode | Yes | JSON (mcp) | ~/.config/opencode/opencode.json |
| Copilot CLI | Partial — via extensions | GitHub extension format | GitHub Copilot settings |
| Codex CLI | Partial | TOML ([mcp_servers]) | ~/.codex/config.toml |
The JSON-RPC protocol is identical across all supporting tools. The only difference is the config file format and location. If you build an MCP server for Claude Code, Gemini CLI can usually use it with only a config change.
- Local script — The most common. Your server is a Python script or Node.js file on your machine. Claude Code launches it with
python my_server.pyornode my_server.js. No installation required beyond the script being accessible. - npm package — Package your server as an npm module. Claude Code launches it with
npx -y your-package. Users don't need to install anything manually — npx downloads and runs on demand. - Compiled binary — Rust, Go, or C++ servers compile to a single executable. Fast startup, no runtime dependencies. Point the config at the binary path.
- Remote server (HTTP) — MCP also supports HTTP transport for remote servers. Your server runs in the cloud; Claude connects via HTTPS. Useful for shared team infrastructure — one server, many developers.
- Docker container — Run your server in a container for isolation. Claude Code launches it with a
docker runcommand.
- Official MCP build-a-server guide — Anthropic's documentation with TypeScript and Python examples
- Claude Code plugins repository — Official plugin examples to study
- awesome-claude-code — Community examples of MCP servers and plugins
Making a Plugin: JIRA Integration Example
Claude Code has its own plugin system — separate from MCP servers. A plugin is a package that gives Claude sessions custom slash commands, skills, and hooks. It contains markdown files (skills, commands), JSON configs (manifests, hook definitions), scripts, and optionally compiled programs written in any language — C#, Go, Python, Rust, whatever you want. It gets installed into your local Claude Code environment and runs whenever Claude needs it.
What Languages Can You Use?
This surprises most people: plugins are not language-restricted. The plugin itself is mostly markdown and JSON. The tools it tells Claude to run can be written in anything.
A plugin has four parts, each written in different "languages":
| Part | What It Is | Written In |
|---|---|---|
| Plugin structure | Manifests, skills, commands, hooks config | Markdown, JSON, YAML frontmatter |
| Hook scripts | SessionStart, PreToolUse, etc. | Bash, PowerShell, or any shell script |
| Utility code | Optional helpers (e.g., skill discovery) | JavaScript/TypeScript (Node.js is already installed with Claude Code) |
| CLI tools invoked by skills | The programs Claude actually runs | Anything — whatever produces an executable |
That last row is the key. When a skill tells Claude to run a command, Claude doesn't care what language that command was built in. It just runs it locally and reads the output. So the tools your plugin invokes can be:
- C# / .NET — Build a console app, invoke it as
dotnet MyApp.dll --get-ticket PROJ-123 --jsonor compile to a native.exe - Go — Compiles to a single static binary with no runtime dependencies. Excellent for cross-platform CLI tools
- Python —
python script.py --query "assignee = currentUser()" - Rust — Single binary, no runtime. Fast and cross-platform
- Node.js / TypeScript — Already available since Claude Code requires Node.js
- Any existing CLI tool — acli, gh, aws, kubectl, curl — if it's on your PATH, your skill can use it
.exe that queries JIRA, a Python developer can write a script that calls the GitHub API, a Go developer can compile a binary that talks to Slack. The skill just says "run this command" and reads the output. Claude doesn't know or care what language produced the binary..exe files only work on Windows.Wait — How Does Claude Even Know What a Plugin Can Do?
This is the question most people skip, and it's the key to understanding the whole system. There's no compiled interface, no API contract, no type system. Instead:
- At session start, the plugin's SessionStart hook fires and injects a special skill — sometimes called a "meta-skill" — into the conversation context. Despite the fancy name, it's just a regular SKILL.md file whose job is to list every other available skill and when to use each one. Think of it as the table of contents for the plugin. (Note: "meta-skill" is not an official Anthropic term — it's plugin-community jargon. Anthropic's docs just call everything a "skill." The community started calling the auto-injected table-of-contents skill a "meta-skill" to distinguish it, but structurally it's identical to any other skill.)
- Claude reads that context the same way it reads any system prompt. It now knows: "I have a
jira-workflowskill I should invoke when the user mentions a ticket" - When Claude invokes a skill, the skill's markdown content gets loaded and Claude follows the instructions — "use
acli.exeto query this ticket, check the assignee, transition the status"
The "contract" is natural language. The plugin teaches Claude what tools exist and how to use them, the same way a README teaches a human developer. There is no schema, no function signature, no compiled binding. Just instructions that Claude interprets at runtime.
Where Does the Code Actually Run?
This is the other common misconception. The plugin code does not run in Anthropic's data center. Here's the actual flow:
- Claude's model runs on Anthropic's GPU servers in a data center — this is where token generation happens
- Claude Code runs on your local machine — it sends prompts to the API and receives responses
- When Claude decides to run a command (like
acli.exe jira workitem view PROJ-12345), that command executes on your machine, not in the cloud - The command's output gets sent back to Claude as context for the next response
JIRA tokens never leave your machine. When the plugin tells Claude to run acli.exe, that command executes locally using credentials stored in your local acli config. The API token goes from your machine directly to Atlassian's servers. Claude's data center never sees it — Claude only sees the JSON output that comes back.
How Plugins Differ from MCP Servers
| Feature | Plugin | MCP Server |
|---|---|---|
| Provides | Slash commands, skills, hooks, agents | Tools (function calls) |
| Installed via | /plugin install | Config in settings.json |
| Runs as | Part of Claude Code's process | Separate long-running process |
| Best for | Workflows, methodologies, multi-step processes | Exposing APIs as callable tools |
The Key Insight: Claude Sessions Don't Talk to APIs Directly
This is the most misunderstood part. When an app launches a Claude session to work on a JIRA ticket, Claude doesn't get raw JIRA REST API access. Instead:
- The host app manages your JIRA credentials securely (API tokens stored via keytar, never exposed to the renderer or to Claude)
- The host app provides a UI to browse tickets and launches Claude sessions with just a ticket key
- The plugin handles all JIRA API calls within the session, using its own credential management
The launch looks like this:
// The app passes just the ticket key to Claude, not credentials
const prompt = `/my-plugin:start-work ${ticketKey}`
const args = ['--session-id', sessionId, prompt]
Claude only ever sees the ticket key. The plugin fetches the actual JIRA data using credentials stored separately.
Anatomy of a Plugin
Here's the structure of a production JIRA workflow plugin that integrates ticket lifecycle management into Claude Code:
my-jira-plugin/
├── .claude-plugin/
│ ├── plugin.json # Plugin metadata & version
│ └── marketplace.json # Marketplace registration
├── hooks/
│ ├── hooks.json # SessionStart hook config
│ ├── run-hook.cmd # Cross-platform polyglot wrapper
│ └── session-start # Injects skills into every session
├── lib/
│ └── skills-core.js # Skill discovery & resolution
├── skills/ # Reusable skills
│ ├── jira-workflow/SKILL.md
│ ├── meta-skill/SKILL.md
│ ├── brainstorming/SKILL.md
│ └── ...
├── commands/ # User-invoked slash commands
│ ├── start-work.md
│ ├── commit.md
│ ├── pr-check.md
│ └── ...
└── agents/ # Agent templates for subagent dispatch
Step 1: The Plugin Manifest
Every plugin needs a .claude-plugin/plugin.json that declares its identity:
{
"name": "my-jira-plugin",
"description": "Development workflow: Jira lifecycle, code review, standards enforcement",
"version": "1.0.0",
"author": {
"name": "Your Team",
"email": "you@example.com"
},
"repository": "https://github.com/your-org/my-jira-plugin",
"license": "MIT",
"keywords": ["jira", "tdd", "code-review", "workflows"]
}
And a .claude-plugin/marketplace.json if you want it discoverable by other users:
{
"name": "my-jira-plugin",
"description": "Development workflow plugin with Jira integration",
"owner": {
"name": "Your Team",
"email": "you@example.com"
},
"plugins": [
{
"name": "my-jira-plugin",
"description": "Jira lifecycle, code review, standards enforcement",
"version": "1.0.0",
"source": "./"
}
]
}
Step 2: The SessionStart Hook
Hooks run commands in response to Claude Code events. The hooks.json file tells Claude Code what to execute:
// hooks/hooks.json
{
"hooks": {
"SessionStart": [
{
"matcher": "startup|resume|clear|compact",
"hooks": [
{
"type": "command",
"command": "'${CLAUDE_PLUGIN_ROOT}/hooks/run-hook.cmd' session-start",
"async": false
}
]
}
]
}
}
The session-start script reads the meta-skill and injects it as context into every new session:
#!/usr/bin/env bash
# hooks/session-start — inject the meta-skill into every session
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]:-$0}")" && pwd)"
PLUGIN_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# Read the meta-skill that tells Claude how to use all other skills
content=$(cat "${PLUGIN_ROOT}/skills/meta-skill/SKILL.md")
# Escape for JSON embedding
escape_for_json() {
local s="$1"
s="${s//\\/\\\\}" # backslashes
s="${s//\"/\\\"}" # quotes
s="${s//$'\n'/\\n}" # newlines
printf '%s' "$s"
}
escaped=$(escape_for_json "$content")
# Output JSON that Claude Code picks up as session context
cat <<EOF
{
"hookSpecificOutput": {
"hookEventName": "SessionStart",
"additionalContext": "${escaped}"
}
}
EOF
This is the magic — every time Claude starts a session, it automatically knows about all the skills available in the plugin.
Step 3: Slash Commands
Commands are markdown files in commands/ that define user-invocable actions. Here's a start-work command that kicks off JIRA integration:
# commands/start-work.md
---
description: "Start work on a Jira ticket. Looks up the ticket, checks assignment,
transitions to In Progress, fetches acceptance criteria, and kicks off brainstorming."
disable-model-invocation: true
---
Invoke the jira-workflow skill to start work on the specified ticket.
Follow this sequence exactly:
1. Look up the ticket (fetch summary, status, acceptance criteria, assignee, sprint)
2. Check assignee — if assigned to someone else, STOP and ask
3. Assign to me if unassigned
4. Transition to In Progress
5. Set sprint to active sprint
6. Present the ticket summary and acceptance criteria to the user
7. Then invoke the brainstorming skill to begin design
Notice: the command itself is just instructions. It delegates the actual work to a skill.
Step 4: Skills (Where the JIRA Logic Lives)
Skills are markdown files in skills/*/SKILL.md with YAML frontmatter. The jira-workflow skill contains the actual JIRA integration logic:
# skills/jira-workflow/SKILL.md
---
name: jira-workflow
description: Use when starting work on a Jira ticket, transitioning ticket
status, updating ticket fields, or checking ticket assignment
---
# Jira Workflow
## Tools
**Primary: acli** (Atlassian CLI) — more token efficient
```bash
# View a ticket with specific fields
acli.exe jira workitem view PROJ-12345 \
--fields "summary,status,description,assignee" --json
# Transition ticket status
acli.exe jira workitem transition -k PROJ-12345 \
-s "In Progress" --yes
# Assign ticket to yourself
acli.exe jira workitem assign -k PROJ-12345 -a @me --yes
```
**Fallback: Jira MCP** — when acli fails or for complex field updates
```
getJiraIssue -> editJiraIssue -> transitionJiraIssue
```
The skill teaches Claude how to interact with JIRA — which CLI tool to use, what fields to fetch, what transitions exist, and what to do when things fail. Claude reads this skill at runtime and follows the instructions.
Step 5: Skill Discovery
The plugin includes a lib/skills-core.js module that handles finding and loading skills at runtime:
// lib/skills-core.js — finds SKILL.md files and extracts their metadata
import fs from 'fs';
import path from 'path';
function extractFrontmatter(filePath) {
const content = fs.readFileSync(filePath, 'utf8');
const lines = content.split('\n');
let inFrontmatter = false;
let name = '', description = '';
for (const line of lines) {
if (line.trim() === '---') {
if (inFrontmatter) break;
inFrontmatter = true;
continue;
}
if (inFrontmatter) {
const match = line.match(/^(\w+):\s*(.*)$/);
if (match) {
if (match[1] === 'name') name = match[2].trim();
if (match[1] === 'description') description = match[2].trim();
}
}
}
return { name, description };
}
function findSkillsInDir(dir, sourceType, maxDepth = 3) {
const skills = [];
if (!fs.existsSync(dir)) return skills;
function recurse(currentDir, depth) {
if (depth > maxDepth) return;
for (const entry of fs.readdirSync(currentDir, { withFileTypes: true })) {
if (entry.isDirectory()) {
const skillFile = path.join(currentDir, entry.name, 'SKILL.md');
if (fs.existsSync(skillFile)) {
const { name, description } = extractFrontmatter(skillFile);
skills.push({ name: name || entry.name, description, sourceType });
}
recurse(path.join(currentDir, entry.name), depth + 1);
}
}
}
recurse(dir, 0);
return skills;
}
Personal skills in ~/.claude/skills/ shadow plugin skills, so teams can override behavior without forking the plugin.
Step 6: Cross-Platform Hook Wrapper
Since hooks execute commands, Windows compatibility requires a polyglot wrapper — a single file that works as both a Windows batch file and a bash script:
: << 'CMDBLOCK'
@echo off
REM Windows batch portion — finds bash and delegates
if "%~1"=="" ( echo run-hook.cmd: missing script name >&2 & exit /b 1 )
set "HOOK_DIR=%~dp0"
REM Try Git for Windows bash
if exist "C:\Program Files\Git\bin\bash.exe" (
"C:\Program Files\Git\bin\bash.exe" "%HOOK_DIR%%~1" %2 %3 %4 %5
exit /b %ERRORLEVEL%
)
REM Try bash on PATH
where bash >nul 2>nul
if %ERRORLEVEL% equ 0 ( bash "%HOOK_DIR%%~1" %2 %3 %4 %5 & exit /b %ERRORLEVEL% )
exit /b 0
CMDBLOCK
# Unix portion — run the named script directly
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
exec bash "${SCRIPT_DIR}/$1" "${@:2}"
The trick: Windows cmd.exe sees the : as a label and runs the batch code. Bash sees : as a no-op and skips to the bottom.
The Full Flow
Here's what happens end-to-end when a developer types /start-work PROJ-12345:
- SessionStart hook already fired — Claude knows about all available skills
- Slash command
start-work.mdtells Claude to invoke thejira-workflowskill - Skill
jira-workflowinstructs Claude to:- Run
acli.exe jira workitem view PROJ-12345 --fields "summary,status,description,assignee" --json - Check assignee — stop if assigned to someone else
- Transition:
acli.exe jira workitem transition -k PROJ-12345 -s "In Progress" --yes - Set sprint:
acli.exe jira workitem edit -k PROJ-12345 --custom "customfield_10007:42" --yes
- Run
- Claude presents the ticket summary and acceptance criteria to the developer
- Next skill (
brainstorming) kicks in automatically to design the approach
Credentials never touch Claude's servers. The plugin instructs Claude to run local CLI commands that use locally-stored API tokens.
Installation & Distribution
Users install a plugin with two commands:
# Register the marketplace (one time)
/plugin marketplace add your-org/my-jira-plugin
# Install the plugin
/plugin install my-jira-plugin@my-jira-plugin
The plugin gets downloaded to ~/.claude/plugins/cache/my-jira-plugin/. A host app can manage updates programmatically:
// A host app can keep the plugin updated automatically
await runCommand(['plugin', 'marketplace', 'update', 'my-jira-plugin'])
await runCommand(['plugin', 'update', 'my-jira-plugin@my-jira-plugin'])
Building Your Own JIRA Plugin
To build a plugin that integrates with JIRA (or any external API), follow this pattern:
- Create the manifest —
.claude-plugin/plugin.jsonwith name, version, description - Add a SessionStart hook — inject context so Claude knows about your skills on every session
- Write skills as markdown — teach Claude which CLI tools to use and how (e.g., acli for JIRA, gh for GitHub)
- Create slash commands — entry points that chain skills together into workflows
- Handle credentials locally — use tools like acli with saved configs or keytar; never pass tokens through Claude
- Publish to the marketplace — add
.claude-plugin/marketplace.jsonso others can install it
The key design principle: skills are instructions, not code. You're teaching Claude a workflow in natural language, backed by CLI tools that handle the actual API calls. Claude never needs raw REST access — it executes local commands that have their own authentication.
A Plugin Is Not a Hook
This is easy to confuse. A plugin is not a hook. A plugin contains hooks, along with skills, commands, and agents. These are four distinct components that live inside a plugin:
| Component | What It Is | When It Runs |
|---|---|---|
| Hook | A command or program triggered by a Claude Code event | Automatically, on events like SessionStart, PreToolUse, Stop |
| Skill | A markdown file (SKILL.md) that teaches Claude a workflow | When Claude or a command invokes it |
| Command | A markdown file that defines a user-typed slash command | When the user types /command-name |
| Agent | A template for spawning a specialized subagent | When Claude dispatches parallel work |
When we say "the SessionStart hook fires," we mean: one specific component inside the plugin (the hook) executes a command in response to a Claude Code event. The plugin as a whole is the package that contains that hook, plus all the skills, commands, and agents.
Is a Plugin "Live" or Dormant?
Dormant. Between sessions, the plugin is just files sitting on your local disk at ~/.claude/plugins/cache/. No process is running. Anthropic's servers have zero awareness that your plugin exists. Nothing is consuming memory, CPU, or tokens.
But it wakes up automatically when you start a session. Here's the lifecycle:
- You start a Claude Code session — the plugin is dormant files on disk
- Claude Code fires the SessionStart event — the plugin's hook runs a script
- The hook injects the meta-skill — a markdown document listing all available skills gets added to the conversation context
- From this point on, Claude "knows" what the plugin can do — because the meta-skill is part of its context window
- When you close the session — the plugin goes back to dormant. Nothing persists in memory.
So the plugin is dormant between sessions, active within sessions, and the transition happens automatically via the SessionStart hook.
Can Claude Use the Plugin Without Being Asked?
Yes — and this surprises most people. Claude can invoke plugin skills on its own, without the user explicitly asking, if the meta-skill instructs it to. Here's a concrete example of how aggressive this can be. A real production meta-skill contains instructions like:
With instructions like this in the context, if you say "fix the bug in ticket PROJ-12345," Claude will:
- Read its injected context and see it has a
jira-workflowskill - Invoke that skill autonomously — without asking you first
- Follow the skill's instructions to run
acli.exelocally to fetch the ticket
However, slash commands can opt out of this. The disable-model-invocation: true frontmatter flag means only a human can type that command. So /start-work requires you to type it, but the underlying jira-workflow skill that it delegates to can be invoked by Claude freely.
Human-Only vs. Claude-Autonomous: The disable-model-invocation Flag
This is one of the most important design decisions a plugin author makes. Every slash command in a plugin has a choice: can Claude invoke it on its own, or does only a human get to type it?
The disable-model-invocation: true flag in a command's YAML frontmatter means: this command requires a human to type it. Claude cannot decide on its own to run /start-work PROJ-12345. Only you can. But here's the subtlety — that restriction only applies to the command, not the skill it delegates to. The jira-workflow skill that /start-work invokes? Claude can call that skill directly, any time, without asking.
This creates a two-tier permission system:
| Layer | Who Can Invoke | Example | Why |
|---|---|---|---|
Command with disable-model-invocation: true | Human only | /start-work PROJ-12345 | Dangerous entry points: transitions a ticket, assigns it to you, starts a whole workflow |
| Command without the flag | Human or Claude | /commit | Safe or routine actions Claude might reasonably initiate |
| Skill | Human or Claude (always) | jira-workflow | Skills are the building blocks — Claude needs to call them freely to do its work |
Think of it like this: the command is the front door with a lock — only the homeowner (you) has the key. The skill is the toolbox inside the house — once someone is inside (via a command you approved, or via Claude's autonomous judgment based on the meta-skill instructions), they can use any tool freely. This lets plugin authors put guardrails on initiating workflows while still letting Claude work autonomously within them.
Plugins and Token Cost
This is the part nobody talks about. The plugin's "contract" IS token usage. There is no free metadata channel, no side-band, no compiled shortcut. Everything Claude knows about the plugin travels as tokens in the conversation context.
Here's what that means concretely:
| What | When It Costs Tokens | Approximate Size |
|---|---|---|
| Meta-skill | Every single message, for the entire session | ~1,500–2,000 tokens |
| Individual skill (e.g., jira-workflow) | Only when invoked — loaded on demand | ~500–2,000 tokens each |
| Slash command | Only when the user types it | ~100–300 tokens each |
| Command output (JIRA JSON, etc.) | Once generated, stays in context | Varies widely |
The meta-skill is the expensive part because it's always there. Every prompt you send — even "what does this variable mean?" — carries the full meta-skill as context overhead. A plugin with 19 skills needs a bigger meta-skill "menu," which means more baseline token cost on every message.
The full flow, in terms of what consumes tokens:
- Session starts → hook fires → meta-skill (~2K tokens) injected into context
- Every prompt you send now carries that meta-skill as context overhead
- Claude reads it, sees available skills, decides which to invoke
- When Claude invokes a skill, that skill's full content loads — more tokens added to context
- Claude follows the skill instructions, runs local commands, gets output
- Command output (JIRA JSON, git diff, etc.) comes back as yet more context tokens
This is a real design tradeoff: more plugin capabilities = more tokens per message = more cost. A lean plugin with 3 skills has minimal overhead. A comprehensive plugin with 19 skills pays a significant per-message tax just for Claude to know what's available. Plugin authors should think carefully about what goes in the meta-skill versus what gets loaded on demand.
Confusing Terms
This guide has uncovered a surprising amount of confused terminology, misleading names, and inconsistent definitions in the world of AI coding tools. Some of this is Anthropic's fault. Some is the broader AI industry's fault. Some is just the natural fog that forms when technical people name things for themselves and forget that others have to understand them too.
Here is a complete accounting of everything we found.
| The Confusing Term | What People Think It Means | What It Actually Means | Who's Responsible |
|---|---|---|---|
| "MCP Server" | A deployed server — something with a port, a network, a cloud deployment | A local program launched as a child process. Listens for JSON-RPC on stdin, writes results to stdout. No port. No network. Stops when your session ends. A tool dispatcher, not a server. | Anthropic — borrowed "server/client" terminology from web architecture even though the implementation is just two programs talking through pipes on your machine |
| "Plugin" vs "MCP Server" | Interchangeable terms for the same thing | Completely different. A Plugin (/plugin install) is a bundle containing slash commands, subagents, hooks, and/or MCP configs. An MCP server is one possible ingredient inside a plugin. |
Anthropic — two distinct concepts with overlapping informal usage |
| "API Usage" / "Usage" / "Tokens" | Three different things | All the same concept — token consumption — described from different angles on the same pricing page. "Usage" alone means two different things: messages in your plan window, OR tokens consumed through the API. | Anthropic — their pricing page uses four words for one concept, acknowledged as "one of the most confusing product pages in the industry" |
| "Extended Context Model" | A different, more powerful AI model | The same Opus 4.6 or Sonnet 4.6, with a larger reading window enabled (1M instead of 200K tokens). Same brain, more desk space. Not a new model. | Anthropic |
| "Skill" | Something that runs — like an Alexa skill or a Windows service | A markdown document that sits dormant until you invoke it. Does nothing on its own. Not a running process. Rename it mentally to "instruction document" or "playbook." | Anthropic — the word "skill" carries active connotations across other platforms |
| Memory files (cost) | Free persistent storage — a nice bonus feature | Mini-CLAUDE.md files. They load automatically every session and cost input tokens on every prompt — just like CLAUDE.md. They are not free. They accumulate silently. | Anthropic — naming implies cost-free persistence; the token cost is not prominently disclosed |
| CLAUDE.md size | A config file — bigger = more features, no cost | Every token in CLAUDE.md is charged as input tokens on every single message, every session, forever. A 500-line CLAUDE.md might cost 3,000–5,000 tokens per message. Treat it like code — refactor and trim it. | Anthropic — not disclosed prominently; discovered by developers monitoring token costs |
| CLAUDE.md "auto-updates" | Claude automatically records what it learns | It doesn't. Claude reads CLAUDE.md every session but only writes to it when you explicitly ask. If you don't tell Claude to update it, knowledge from a session is lost forever. | Anthropic — the passive loading behavior makes users assume writing is also automatic |
| "Fork" vs "Agent" | Two words for spawning a separate Claude instance | Fork = git branch for a session. Copies full conversation history. Becomes a new permanent independent session. Agent = subprocess. Starts fresh with only its task brief. Terminates when done. You keep working. | AI industry — both "split off" from current context, but are architecturally opposite |
| "Plugin" (informal) | An MCP server — something that connects Claude to external tools | Informally used to mean MCP server by most tutorials and developers. Formally means a bundle that can contain MCP configs plus slash commands, subagents, and hooks. | AI industry — informal usage has diverged from the formal product definition |
| Which "Claude"? | One product | At least six different surfaces: Claude.ai (website), Claude Desktop Chat tab, Claude Desktop Code tab, Claude Code CLI, Claude in VS Code/JetBrains, Claude in Slack, Claude in CI/CD. The Code tab in Claude Desktop IS Claude Code. They're the same engine. | Anthropic — poor surface naming; "Claude Code" appears as both a tab name and a product name |
| "Claude Code" vs "Claude the model" | The same thing — "Claude" | Claude Code is the tool (like Visual Studio). Claude Opus/Sonnet/Haiku are the AI models inside it (like the compiler version). You can switch models mid-session. They are independent. | Anthropic — "Claude" is used for both the platform and the model family |
| "Governance features" (Enterprise) | A capability upgrade — makes Claude smarter or more powerful | Audit logs, SSO, SCIM, RBAC, custom data retention. Organizational compliance plumbing. Makes IT and legal teams happy. Does not make Claude smarter by one IQ point. | Enterprise software industry — "governance" is a feature category, but buyers often conflate it with capability |
| "~/.claude/" notation | Incomprehensible Unix gibberish | On Windows: C:\Users\YourName\.claude\. The tilde means "your home directory." The dot means "config folder." ~/.claude/ (with tilde) = your personal folder. .claude/ (no tilde) = inside your current repo. Two completely different places. |
AI industry — Unix conventions applied without translation for Windows users |
| "Repository" / "repo" | A mystical technical concept | A folder with version history tracking turned on. That's it. Your local repo is a folder on your hard drive. The GitHub repo is the same folder on GitHub's server. | Software industry — the word "repository" carries unnecessary gravitas |
| "Pull Request" | Pulling someone else's changes | A request to push your changes for others to review before merging. You're pushing, not pulling. The name is backwards from the user's perspective. | GitHub — a naming decision that has confused developers for 15+ years |
| "IQ" and "Awareness" | One dimension — how good the AI is | Two completely independent dimensions. IQ = reasoning quality (how well it thinks per token). Awareness = context window size (how much it can hold in mind at once, measured in pages). A genius seeing 267 pages beats a mediocre model seeing 1,333 pages on focused tasks — but loses on large codebase reasoning. | AI industry — "model quality" is discussed as a single axis; the context/reasoning split is rarely taught |
| Skills are "contextually loaded" | Claude automatically detects when a skill applies and loads it | False. There is no background skill-matching. Skills sit dormant. You are always the trigger — either via slash command or explicit mention. Older docs described "contextual loading" which was misleading. | Anthropic — earlier documentation implied automatic skill activation |
| Skills "deactivate" cleanly | Skills stay active for a session until you stop them | Skills can be silently dropped by /compact without warning. After compaction, Claude may stop following the skill's instructions with no error message. You won't notice until you see Claude ignoring the skill's rules. | Anthropic — undisclosed side effect of compaction; documented in GitHub issues, not official docs |
| Documentation vs. Memory files | Both "store" knowledge for Claude to use | Memory files: always loaded, cost tokens, Claude-only. Documentation (README.md etc.): costs zero until read, works with any AI, survives tool switching. Memory files are wired to Claude's brain. Documentation is a spare brain that any AI can borrow. | AI industry — the concept of "dormant, platform-agnostic memory" as distinct from "always-on AI memory" is not widely taught |
| Plan Mode (Shift+Tab) vs. plain prompting | Plan Mode is how professionals use Claude; prompting is for beginners | For most people, telling Claude "plan first, wait for approval" in a plain prompt produces the same result. Plan Mode's advantage is hard mechanical enforcement — it can't edit files during planning even if it wants to. The capability is identical; the guardrail is different. | AI industry — product features are marketed as necessary when prompting achieves the same outcome 95% of the time |
| Enterprise plan = more Awareness | Enterprise gets a larger context window than Max | As of March 2026, Max and Enterprise have identical 1M context windows. They equalized when Anthropic shipped 1M context generally. Pre-March 2026, Enterprise had 500K and Max had 200K — articles written then are now outdated. | Anthropic — tier benefits changed but old articles persist; the gap no longer exists at published tier level |
| "Token" — four different meanings, only three related | One word, one meaning | "Token" is one of the most overloaded words in this space. It has four different meanings depending on context, and only three of them are related: (1) Text unit — ~4 characters (rough estimate), the chunk of text the model reads and writes. (2) Billing unit — what you pay for; input and output tokens are often priced separately because processing them can impose different costs ($3/MTok input, $15/MTok output on Sonnet). (3) Inference / KV-cache position — during generation, each token in the active context contributes to memory use in the model's attention machinery; the raw text is tiny but the model's internal mathematical representation costs roughly 64 KB to 0.5 MB of server GPU memory depending on architecture. (4) Security credential — a string that proves your identity (GitHub Personal Access Token, API key, bearer token) — completely separate from the other three. See the Glossary for full definitions of each. | Software industry — "token" is independently overloaded in linguistics, economics, computer security, and AI infrastructure, then all four usages collide in a single AI coding session |
| "Context window" / "tokens" — what they physically are | Something on your laptop, like RAM or disk storage | The same idea travels through six names: Characters → Tokens → HTTPS request → KV cache → Context window → Awareness (pages). It starts on your keyboard and ends in GPU VRAM on Anthropic's servers. Your laptop contributes nothing except sending text over the internet. The math: 1 token ≈ ~0.5 MB of GPU VRAM (rough midpoint; empirically measured range ~64 KB–0.5 MB depending on architecture). 200K tokens ≈ ~100 GB. 1M tokens ≈ ~500 GB at the midpoint. Real number varies significantly by model design. | AI industry — each name is technically correct at its layer, but nobody explains the journey. "Tokens" sounds like a billing unit. "Context window" sounds like a software setting. Neither hints that you're reserving terabytes of server GPU memory. → Chapter 15 (advanced box) |
| "Meta-skill" — is it a skill or not? | A different kind of thing from a skill | A "meta-skill" is just a regular skill (a SKILL.md file with YAML frontmatter, like any other) that happens to have one special job: listing all the other skills and telling Claude when to use each one. It gets auto-injected at session start via a hook. There is nothing structurally different about it — same file format, same directory layout, same invocation mechanism. The "meta-" prefix makes it sound like a separate concept or a higher-level abstraction. It's not. It's the table of contents for the plugin, written as a skill so Claude can read it like any other instruction. | Plugin ecosystem — naming a regular thing with a prefix that implies it's a different kind of thing |
| "Task" — two unrelated meanings | One meaning | "Task" is used two completely different ways in Claude Code. (1) Todo item — an entry in the built-in todo tracker (the "Task Tool"), something to be done, marked in-progress, or completed. (2) Subagent process — an independent Claude subprocess launched via the Agent/Task tool to do work in parallel. A subagent (meaning 2) can create todo items (meaning 1), which means you can have a task managing tasks. The word "todo" is sometimes used to mean only the first sense, but "task" is used for both with no consistent distinction in the docs. | Anthropic — same word chosen for two separate features with no disambiguation |
| "Skill" vs "Plugin" vs "Plugin skill" — three overlapping terms | A clear hierarchy where each term means something distinct | A skill is a single SKILL.md instruction document you write and drop in .claude\skills\. A plugin is an installed package (via /plugin install) that can contain skills, commands, hooks, and agents — a whole workflow system. A plugin skill is a skill that came bundled inside a plugin rather than one you wrote yourself. The confusion: skills and plugin skills are identical on disk (same SKILL.md format, same folder structure) — the only difference is where they live. And the word "plugin" in everyday speech means "a small add-on," which is also what a skill is, making the two concepts sound interchangeable when they aren't. A skill is one document. A plugin is a deployable package that may contain many skills. |
Anthropic — "skill" used for both a standalone concept and a component inside a plugin, with no visual or structural distinction between them |
Count: 27 documented confusions. Attribution: ~13 to Anthropic (purple rows), ~7 to the broader AI/software industry, ~4 to evolving product definitions. This list will grow — the AI tooling space moves faster than its documentation.
Find Your Path
Not sure where to start? Pick whoever sounds most like you.
Your First 15 Minute Session with Claude Code
Theory later. Start here. Do these steps in order in a real project directory and you'll understand more in 15 minutes than you will from reading the whole Manifesto first.
Open a Terminal in Your Project
Navigate to a project you already know. Don't start with a new or empty folder — you'll get more value immediately if Claude has real code to look at.
cd C:\repos\MyProject
If you're in VS Code, open the integrated terminal with Ctrl+` and you're already in the right directory.
Start Claude Code
claude
You'll see a prompt. You're now in a session. Claude can already see your project directory. It hasn't read any files yet — but it's ready to.
If this is your first time, it will ask you to authenticate with your Anthropic account. Follow the prompts.
Ask for a Codebase Overview
Type this:
Give me a high-level overview of this codebase. What does it do, how is it structured, and what are the main entry points?
Claude will read your project structure and key files, then give you a summary. Even if you built this codebase yourself, the description from an outside perspective is often useful. For someone new to the project, this is gold.
Ask Where a Feature Lives
Pick something specific in your project. Ask:
Where is the user authentication handled? Show me the relevant files and the flow.
(Substitute your own feature.) Claude will search the codebase, find the relevant code, and trace the flow for you. This is the moment most people realize the CLI is fundamentally different from the chat box — it's actually looking at your code.
Make One Safe Change
Ask Claude to make a small, low-risk change. A good first one:
Find any TODO comments in the codebase and list them with file paths and line numbers.
Or ask it to add a comment, rename a variable in one file, or fix a typo. Watch it use the Edit tool to actually modify the file. Then open the file in your editor and see the change.
This moment — watching Claude edit your file directly — is what changes people's understanding of what a CLI AI tool actually is.
Run Your Tests
If your project has tests, ask Claude to run them:
Run the tests and tell me if anything is failing.
Claude will use the Bash tool to run your test command, read the output, and interpret the results. If tests fail, it can diagnose the failures and propose fixes. This is the loop described in the Practitioner's Guide — compressed into one step for your first session.
If you don't have tests yet, ask:
What would a good test suite look like for this project? What's the most important thing to test first?
Ask for a Plan Before Making a Real Change
Pick something non-trivial you've been meaning to do in the project. Before Claude touches any files, ask for a plan:
I want to [add feature X / refactor Y / fix bug Z]. Before you start, describe your approach: which files you'll change, in what order, and what risks to watch for.
Or press Shift+Tab to enter Plan Mode, then describe the task. Claude will produce a plan for you to review before anything gets touched.
Reviewing the plan takes 30 seconds and catches misunderstandings before they become edits you have to undo.
Learn These Four Commands
Before you end your session, try each of these:
| Command | What it does | When to use it |
|---|---|---|
/help | Lists all available slash commands and built-in commands | Whenever you're not sure what's available |
/compact | Compresses conversation history to free context window space | Long sessions, before starting a new major task in the same session |
/cost | Shows token usage and estimated cost for this session | Whenever you're curious about spend, or before a long expensive task |
Ctrl+C then claude --continue | Exit and resume the same session later | When you need to stop and come back; keeps all context intact |
--resume (instead of --continue) shows a list of all previous sessions to choose from. Useful when you have multiple projects or sessions running.
After Your First Session
You've done the most important thing: you've seen it work on real code. Now the Manifesto chapters will make sense because you have concrete experience to attach them to.
- Read Chapter II (Sessions) — now you understand what a session actually is
- Read Chapter III (CLAUDE.md) — now you know why you'd want persistent instructions
- Run
/init— let Claude generate a CLAUDE.md for your project - Try the First Session Quiz — test what you just learned while it's fresh
An AI Practitioner's Guide
This is practical advice for building real software with AI coding tools. Not theory. Not hype. This is what works when you're 30,000 lines deep and still adding features.
The AI Code Loop
There is a loop. Once you see it, you can't unsee it, and everything gets easier. Here it is:
- Instruct. Tell your AI coding tool what to build. Be specific about constraints: testing methodology, code organization, reuse patterns.
- Write Tests. AI writes test cases first. They will fail — they are RED — because there is no code yet to make them pass.
- Code. The AI writes code. You watch, guide, and course-correct.
- Test. Tests run — either in a test framework, or self-contained in the app itself. Every change set triggers validation. Tests should now be GREEN.
- Review. A different AI (not the one that wrote the code) reviews it. Copilot reviews Claude's code. Codex reviews Claude's code. A fresh Claude session reviews the existing session's code. Multiple perspectives catch what a single perspective misses.
- Fix. Feed the review findings back into the coding AI. It fixes. Tests run again. Repeat until green.
- Document. AI updates the changelog, README, product analysis, and CLAUDE.md with what was learned.
- Commit. Version number, git tag, check in. You never lose work.
Then you start the loop again for the next feature.
Shift+Tab before coding and Claude describes its entire approach — files it will touch, functions it will write, decisions it will make — before writing a single line. You approve or redirect before anything changes. This is the loop's first gate, enforced by the tool rather than by willpower.Ctrl+\ to fork your current session — the new session inherits the full conversation history but runs independently. Point the fork at your code and ask it to review. Two Claude perspectives on the same code, with full context, in seconds./start-feature in .claude/commands/start-feature.md. The file contains your full loop as a prompt template: "We are starting a new feature. Follow this sequence: (1) write failing tests first, (2) implement the code, (3) run tests, (4) write to PR.md for review, (5) update CLAUDE.md with any hard-won facts. Never skip a step." Every feature session starts with /start-feature and the loop is loaded automatically — no re-typing, no forgetting.Red-Green Testing
Tell your AI to write the test before writing the fix. This is non-negotiable.
- RED: Write a test that catches the bug (or validates the new feature). Run it. It should fail. If it passes, the test is wrong — it doesn't actually detect the problem.
- GREEN: Write the production code that makes the test pass.
- Verify: Run all tests. The new one passes, and nothing else broke.
Why this matters: AI will happily write a test that passes on buggy code. That test is worse than no test — it gives you false confidence. The red phase proves the test actually detects the problem.
Write Broad Tests, Not Narrow Ones
When you find a bug, don't write a test that checks one specific line. Write a test that scans the entire codebase for the pattern that caused the bug. If a missing null check crashed one function, scan every function for the same missing check. AI makes the same mistake in many places at once — your test should catch all of them.
Code Architecture That Survives AI
AI writes code that works right now. Your job is to ensure it still works after 50 more changes. That requires structure.
Separation of Concerns
Put related code in related directories. Menus go in ui/. Database access goes in data/. Session management goes in session/. This isn't perfectionism — it's how you find things six months from now, and how AI finds things when its context window fills up.
ui/ that says "this directory contains only rendering and display code — no business logic, no data access." Claude picks it up when working in that directory. The architecture becomes self-documenting and self-enforcing — the rules travel with the code, not just the project root.Show-* or Render-* was written outside ui/, or a function named Get-All*Sessions was written outside session/, the hook warns immediately. Enforcing directory conventions via hook means AI can't quietly put code in the wrong place without you finding out.Code Reuse (The Central Battle)
AI will write a new variation of a function every time you ask for something similar. You'll end up with three functions that format dates, four that parse JSON, and five that build file paths. This is the #1 maintenance problem with AI-generated code.
Fight it constantly:
- When you see the same 3 lines in two places, extract a shared function
- Put shared helpers in a
core/orutils/namespace - Tell your AI explicitly: "Check if a helper already exists before writing new code"
- Put this rule in CLAUDE.md so it persists across sessions
reuse-first that instructs Claude to search for existing helpers before writing any utility function — listing what it found before proceeding. When your meta-skill says to invoke it before implementation work, the check becomes part of the workflow rather than advice Claude can quietly skip.PostToolUse hook that runs after every file edit, triggering a lightweight duplicate-detection script on the changed file. For example, a hook that runs grep -rn looking for function signatures similar to what was just written, and warns you if a near-duplicate already exists elsewhere in the codebase. This turns "fight it constantly" into "get alerted automatically." Put the rule in CLAUDE.md ("never write a new helper without checking for existing ones"), and back it up with a hook that actually enforces it. Belt and suspenders.Registry-Driven Design (Avoid If-Else Chains)
When your code needs to handle multiple similar things differently (platforms, providers, formats), don't write:
if (platform == "claude") { doX() }
else if (platform == "gemini") { doY() }
else if (platform == "codex") { doZ() }
Instead, create a registry — a data structure that maps each variant to its behavior:
registry = {
claude: { handler: doX, color: "blue" },
gemini: { handler: doY, color: "yellow" },
codex: { handler: doZ, color: "magenta" }
}
// Then: registry[platform].handler()
Adding a new platform means adding one registry entry — not hunting through every if-else chain in the codebase. This pattern is worth learning. AI will default to if-else chains every time unless you tell it not to.
/check-patterns that instructs Claude to scan the codebase for if-else chains keyed on known variant strings (e.g., platform names, file type strings, provider names). The command prompt can be as simple as: "Search the entire codebase for if/elseif blocks that branch on string comparisons for known variants. List every occurrence and suggest the registry pattern as a replacement." Run it periodically — especially after a heavy coding session — to catch drift before it compounds.On SOLID and Design Principles
SOLID principles (Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion) are useful guidelines, not commandments. Apply them where they reduce complexity:
- Single Responsibility — always useful. One file, one job. One function, one purpose.
- Open/Closed — the registry pattern above is this in practice. Extend by adding data, not modifying existing code.
- Liskov / Interface Segregation / DI — valuable in typed languages with class hierarchies (C#, Java, TypeScript). Overkill in scripting languages (PowerShell, Python, Bash). Don't force interfaces and dependency injection into a 500-line script.
The practical test: "Would this pattern prevent a real bug or make a real change easier?" If yes, use it. If it just adds abstraction for abstraction's sake, skip it.
Cross-AI Code Review
Never let the AI that wrote the code be the only one that reviews it. This is like letting the student grade their own exam.
isolation: "worktree" automates this step. Dispatch a subagent to review the changed files — it gets an isolated copy of the repo, reads the code fresh with no memory of writing it, and reports findings back. No manual copy-paste to a second tool. No re-explaining context. A genuinely independent perspective, triggered with one tool call.The Process
- Claude Code writes the code
- Open a different AI (Copilot, Codex, a fresh Claude session, Gemini) and point it at the code
- Ask it to review for: bugs, security issues, missed edge cases, code duplication, naming problems
- The reviewer writes findings to a file (
PR.md) - Feed
PR.mdback to the original Claude Code session - Claude fixes the issues using red-green methodology
Two reviewers are better than one. Different AI models catch different things — GPT models notice different patterns than Claude models.
/peer-review. The command instructs Claude to: (1) summarize all changes made in this session, (2) write a structured review request to PR.md including changed files, what changed, and what to look for, (3) remind you to open PR.md in a separate AI session for review. When you return with the reviewer's findings, run /fix-pr which reads PR.md and instructs Claude to address each finding using red-green methodology. The whole handoff is encoded in two slash commands.Testing at Scale
As your codebase grows, your test count should grow with it. Hundreds of tests are normal. Thousands are not unusual for a mature project.
Where Tests Live
Wherever makes sense for your project:
- In a test framework (Jest, pytest, Pester) — if one exists for your language
- In the app itself — if no framework exists, build validation into your code. A "run tests" menu option that executes all checks and reports PASS/FAIL is perfectly valid
- As static analysis — tests that scan your source code for known anti-patterns (hardcoded values, missing error handling, duplicate logic)
The output format matters: PASS, FAIL, WARN with explanations. This output can be copied and fed directly back to AI as instructions for what to fix.
Pattern-Level Testing
When you find a bug, ask: "Is this a one-off mistake, or a pattern?" If AI wrote a function with a missing null check, it probably wrote ten functions with missing null checks. Write a test that scans all functions for the pattern, not just the one where you found it.
Lint and Static Analysis
Use external lint tools for your language (ESLint, PSScriptAnalyzer, pylint, etc.). Run them on every change. They catch categories of bugs that neither you nor AI will notice during review.
npx eslint $FILE --max-warnings 0. For PowerShell: Invoke-ScriptAnalyzer $FILE. The output appears in the terminal immediately after each edit. Claude sees the lint output and can fix violations in the next turn — without you having to manually invoke the linter or paste results back. The loop tightens from minutes to seconds.Documentation as a Development Tool
AI-generated documentation is not busywork. It's a development tool.
The Files That Matter
- CLAUDE.md (or your AI's equivalent) — Hard-won facts. Rules. Anti-patterns to avoid. Architecture decisions. This file is read by AI at the start of every session. It is your persistent memory across conversations.
- CHANGELOG.md — What changed per version. Lets you (and AI) understand the evolution of the codebase.
- README.md — What the project is, how to build it, how to run it. Useful for onboarding new AI sessions, too.
- Code comments — Not obvious-what comments ("increment counter"). Hard-won-fact comments ("Do NOT use KeyAvailable in a processing loop — it blocks on spurious events"). These survive across sessions because they're in the code.
CLAUDE.md as Institutional Memory
Every painful debugging session should end with an update to CLAUDE.md. If you spent 2 hours figuring out that a particular API silently returns null on Sundays, write it down. AI will encounter the same trap in a future session. CLAUDE.md prevents your AI from repeating your mistakes.
CLAUDE.md or AGENTS.md for your project. Fill in your product name, variant dimension, key entity, and runtime, and download a ready-to-use file. The Day-1 Prompt →Let AI Document Your Work
After completing a feature, tell AI to update the changelog and product analysis. Reading AI's description of what you just built is surprisingly useful — it gives you an outsider's perspective on your own creation. Inconsistencies and gaps become visible when someone else describes your work back to you.
/update-docs that instructs Claude to update CHANGELOG.md, README.md, and CLAUDE.md with a summary of what changed in this session.Source Control Is Your Safety Net
Commit early, commit often, commit with version numbers.
- Small commits — easier to test, easier to revert, easier to understand in the git log
- Version tags — every significant milestone gets a version (v1.0, v2.0). You can always go back.
- Experiment freely — with source control, you can try bold refactors. If they fail, revert. Without source control, you're walking a tightrope with no net.
- Branch for experiments — try something risky on a branch. If it works, merge. If not, delete the branch. No harm done.
isolation: "worktree" on the Agent tool) when asking Claude to attempt a risky refactor or experiment. The agent gets its own isolated copy of the repo, does all its work there, and reports back. If it succeeded, you get the branch name to review and merge. If it failed, the worktree is discarded — your working tree was never touched. This is the programmatic equivalent of "try the experiment on a branch." You get full reversal with zero manual branching.Managing AI Behavior
AI is a powerful but opinionated collaborator. Here are the behaviors you'll need to manage:
AI Adds Unnecessary Complexity
AI over-engineers by default. It adds error handling for impossible cases, creates abstractions for one-time operations, and builds configurable systems when you asked for a simple function. Tell it explicitly: "Keep it simple. Don't add features I didn't ask for. Don't add error handling for scenarios that can't happen."
AI Duplicates Instead of Reusing
Already discussed, but worth repeating: AI writes new code when existing helpers would do. Put "check for existing helpers before writing new code" in every instruction file.
AI Fixes Tests Instead of Code
When a test fails, AI's first instinct is to change the test expectations to match the (buggy) output. "Never fix a test to make it pass. Fix the production code." Put this in CLAUDE.md in bold.
AI Forgets Across Sessions
Every new session starts fresh. AI doesn't remember your architecture decisions, naming conventions, or past mistakes unless you write them down. CLAUDE.md, memory files, and code comments are how knowledge persists. Without them, you'll re-explain the same things session after session.
~/.claude/projects/*/memory/) for knowledge that doesn't belong in CLAUDE.md but needs to persist. CLAUDE.md is for rules and constraints. Memory files are for richer context: "The authentication module was rewritten in v8.2 because the old session token format didn't comply with the new security policy. The old format is still present in legacy data — do not assume all tokens are the new format." Create a slash command called /remember that instructs Claude to write a new memory file capturing what was just learned. Over time, these files become a project knowledge base that survives indefinitely across sessions.AI Makes Consistent Mistakes
When AI makes a mistake in one place, it has probably made the same mistake everywhere. Don't just fix the instance — search the entire codebase for the pattern. Write a test that catches the pattern globally. This is the single most effective quality practice for AI-generated code.
A Note on the Future
By 2027, you probably won't need most of these practices. AI tools will get better at maintaining context, avoiding duplication, and testing their own output. The tools will internalize the discipline that we currently impose manually.
But it's still 2026. And right now, the difference between a developer who follows these practices and one who doesn't is the difference between a project that grows gracefully and one that collapses under its own weight at version 5.
The loop works. Use it.
Remote Control from Your Phone
Shipped February 25, 2026. Start a task on your laptop, put it in your bag, and keep full control from your phone. The session runs locally on your machine — your filesystem, MCP servers, tools, and project config all stay available. You just control it remotely.
Requirements
- Claude Code v2.1.52+ — update with
npm update -g @anthropic-ai/claude-code - Any paid Claude plan — Remote Control is available on all plans. Team and Enterprise admins may need to enable Claude Code in admin settings first.
- Authenticated CLI — run
/loginif you haven't already - The Claude mobile app (iOS/Android) or any browser pointed at claude.ai/code
Method 1: From an Existing Session (/rc)
You're already in a Claude Code session and want to hand it off to your phone:
- Optional but recommended: type
/rename MyTaskso you can find the session by name on your phone - Type
/rc(short for/remote-control) - A session URL appears in your terminal
- Press spacebar to toggle a QR code display
- Scan the QR code with your phone — it opens directly in the Claude app or browser
- You're now controlling the session from your phone. The conversation stays in sync.
Method 2: Start a New Remote Session
Start a session that's remote-ready from the beginning:
cdto your project directory- Run
claude remote-control - The terminal shows a session URL and QR code (spacebar to toggle)
- Scan or open the URL on your phone
- Start prompting from your phone immediately
Three Ways to Connect from Your Phone
| Method | How | Best For |
|---|---|---|
| QR Code | Scan the QR code shown in terminal (press spacebar to show it) | Quickest — phone camera to session in 2 seconds |
| Session URL | Copy the URL from terminal, open in any browser | When you're connecting from a tablet or another computer |
| Session List | Open claude.ai/code or the Claude app, find the session by name | When you named the session with /rename and want to find it later |
Remote Control sessions show a computer icon with a green dot in the session list when online.
Enable for Every Session (Auto Mode)
If you always want Remote Control available:
- Type
/configinside any Claude Code session - Set "Enable Remote Control for all sessions" to
true - Every future session will automatically be available for remote connection
What Happens Under the Hood
- Claude Code registers with the Anthropic API and polls for work
- When you connect from your phone, the server routes messages between your device and your local session over a streaming connection
- All traffic travels through the Anthropic API over TLS (encrypted)
- Your code never leaves your machine — only conversation messages are transmitted
- One remote connection per session (you can't have two phones controlling the same session)
Constraints and Gotchas
- Terminal must stay open. If you close the terminal or stop the process, the session ends. Minimize it, don't close it.
- Laptop sleep is okay. The session reconnects automatically when your machine wakes up.
- 10-minute network timeout. If your machine loses network for more than ~10 minutes while awake, the session times out and the process exits.
- All paid plans. Remote Control is available on all Claude plans. Team and Enterprise admins may need to enable Claude Code in admin settings.
- Phone typing is painful. Remote Control is best for monitoring progress, approving tool uses, and giving short follow-up prompts — not for writing detailed instructions. Start the big prompt on your laptop, then monitor from your phone.
Practical Workflow
- At your desk: start a Claude Code session, give it a complex task
- Type
/rc, scan the QR code with your phone - Close your laptop lid (sleep is fine)
- From your phone: watch Claude work, approve tool uses, answer questions
- Back at your desk: open your laptop, the terminal is still running, continue in the terminal
Other AI Coding CLIs
Claude Code isn't the only AI coding CLI. Here's how the major players compare on the features discussed in this guide. All of these are terminal-based coding assistants — not IDE plugins, not chat UIs.
| Feature | Claude Code | Codex CLI | Copilot CLI | OpenCode | Gemini CLI |
|---|---|---|---|---|---|
| Vendor | Anthropic | OpenAI | GitHub / Microsoft | Community (OSS) | |
| Default Model(s) | Claude Opus, Sonnet, Haiku | GPT-5.x Codex variants | Claude Haiku, GPT-4.1, GPT-5 Mini | Any (Claude, GPT, etc.) | Gemini 2.5 Pro/Flash, 3 Pro |
| Session Persistence | Yes — .jsonl files | Yes — SQLite | Yes — events.jsonl | Yes — SQLite | Yes — JSON files |
| Resume Sessions | Yes --resume |
Yes resume |
Yes --resume |
Yes --session |
Yes --resume |
| Fork Sessions | Yes --fork-session |
Yes fork |
No | Yes --fork |
No |
| Project Instructions (CLAUDE.md equivalent) | CLAUDE.md | AGENTS.md | copilot-instructions.md | AGENTS.md | GEMINI.md |
| Skills / Instruction Files | Yes — SKILL.md in .claude/ | No | No | No | No |
| Slash Commands | Yes — .claude/commands/ | No | No | No | No |
| Hooks (Event Triggers) | Yes — pre/post tool use, session start | No | No | No | No |
| MCP Servers (Plugins) | Yes — full MCP support | No | Limited — extensions via GitHub | No | Yes — MCP support |
| Agents / Subagents | Yes — spawns child Claude instances | No | No | No | No |
| Memory (Cross-Session) | Yes — ~/.claude/projects/*/memory/ | No | No | No | No |
| Plan Mode | Yes — structured planning before coding | No | No | No | No |
| File Editing | Yes — Read, Edit, Write tools | Yes — apply patch model | Yes — file operations | Yes — file operations | Yes — file operations |
| Shell Command Execution | Yes — Bash tool | Yes — sandbox mode | Yes — shell access | Yes — shell access | Yes — shell access |
| Permission Model | Per-action approval or bypass mode | Sandbox + approval tiers. Creates restricted Windows user accounts for OS-level isolation on Windows. | --yolo (allow-all) mode |
Config-based | --approval-mode yolo |
| Cost Model | Per-token (input/output/cache) | Per-token blended rate | Premium requests (subscription) | Per-message (provider-dependent) | Free tier / API key billing |
| Install Method | npm i -g @anthropic-ai/claude-code |
npm i -g @openai/codex |
npm i -g @github/copilot |
npm i -g opencode-ai |
npm i -g @google/gemini-cli |
| Individual Cost (monthly) |
Free tier available Pro: $20/mo Max 5×: $100/mo Max 20×: $200/mo |
Included in ChatGPT plans ChatGPT Plus: $20/mo ChatGPT Pro: $200/mo (No standalone Codex plan) |
Free tier available Copilot Pro: $10/mo Copilot Pro+: $39/mo (Includes IDE + CLI) |
Free (open source) Pay API costs directly (Bring your own API key) |
Free tier available Google AI Pro: $20/mo Google AI Ultra: $250/mo (Covers Gemini CLI + other Google AI) |
| Value for coding | High — richest feature set justifies cost | Medium — good if you already pay for ChatGPT | Best value — $10/mo for a full coding CLI | Cheapest — pay only for tokens used; no subscription | Medium — $20/mo bundles many Google AI tools |
Claude Code (Anthropic) is the subject of this entire guide. It has the deepest architecture of any CLI coding tool — skills, hooks, slash commands, agents, memory, plan mode, and Remote Control. If you invest time in learning one CLI deeply, this is the one.
Codex CLI (OpenAI) is OpenAI's answer to Claude Code. It uses the GPT-5 Codex model variants, stores sessions in SQLite, and supports forking. It runs in a sandbox mode with tiered approval for commands. Its main advantage is the GPT-5 Codex model family, which has strengths in certain code generation tasks. Weaker extension architecture than Claude Code.
Codex usesAGENT.mdthe same way Claude usesCLAUDE.md. Same concept: a markdown file loaded at session start that gives the AI persistent instructions. If you have a well-crafted CLAUDE.md, copy it toAGENT.mdand Codex will pick it up. Your architecture rules, coding standards, and anti-patterns carry over with zero extra work.Windows users: Codex creates sandboxed Windows user accounts on your machine as part of its security model. You may notice new user accounts (named something like "Codex Sandbox" or similar) appearing in your Windows user list after installing Codex. This is intentional — those restricted accounts run Codex's operations with limited OS permissions so that commands cannot reach beyond what a low-privilege user can do. Do not delete them; removing them breaks Codex's sandboxing. This is specific to Codex's Windows implementation and is not something Claude Code does.
Copilot CLI (GitHub / Microsoft) is backed by Microsoft's investment in GitHub and OpenAI. It offers multiple model choices (Claude Haiku, GPT-4.1, GPT-5 Mini), which is unusual — most CLIs are locked to their vendor's model. It has session persistence and file editing but no forking, no skills, no hooks. Strong choice if your organization is already deep in the GitHub ecosystem.
OpenCode is open-source and model-agnostic — you can point it at Claude, GPT, or any other supported provider. This makes it attractive for organizations that want control over which AI model processes their code, or that need to run everything on-premises. The trade-off is a thinner feature set: no skills, no hooks, no memory. A good choice for teams with strict data residency requirements.
Gemini CLI (Google) brings Google's Gemini models to the command line. Gemini 2.5 Pro has an exceptionally large context window, which helps with large codebases. It supports MCP servers and has session persistence. Weaker extension architecture than Claude Code, but the model itself handles large-scale codebase analysis well. Natural choice if your organization is on Google Cloud.
All five CLIs above send your code to a cloud API. Ollama, LM Studio, Jan, and similar tools take a different approach: they run AI models locally on your machine, with no data leaving your network. This is a fundamentally different architecture.
Ollama is the most popular. It downloads open-source models (Llama, Mistral, CodeLlama, DeepSeek, and others) and runs them locally via a REST API on localhost:11434. Other tools (including OpenCode) can point at an Ollama endpoint instead of a cloud API, giving you local inference with a familiar CLI experience.
The trade-offs are real:
- Privacy: Your code never leaves your machine. For proprietary or classified code, this is critical.
- No subscription cost: After hardware, inference is free.
- Hardware requirements: Running a capable coding model requires a modern GPU with significant VRAM (16GB+). Weak hardware means slow, low-quality results.
- Model quality gap: Local open-source models are improving fast but are still behind GPT-5 and Claude Opus for complex reasoning and code generation as of 2026.
- No built-in tools: Ollama is an inference server, not a coding assistant. You need a separate CLI layer (like OpenCode pointed at Ollama) to get file editing, tool use, and session management.
If your organization has strict data policies that prohibit sending code to any external service, the Ollama + OpenCode combination is worth investigating. For most developers without those constraints, cloud-based CLIs currently offer better model quality and richer tooling.
The AI coding tool space is moving fast. This guide covers the five most widely used cloud-based CLIs as of early 2026, but the following tools also exist and may be relevant depending on your stack:
- Aider — An open-source Python-based coding assistant with strong git integration. Predates Claude Code and has a loyal following. Works with multiple models.
- Continue — Primarily a VS Code/JetBrains extension but has CLI capabilities. Model-agnostic.
- Cline — VS Code extension with CLI roots. MCP support, model-agnostic.
- Cursor — An IDE (fork of VS Code) with deeply integrated AI. Not technically a CLI but competes for the same workflow.
- Amazon Q Developer CLI — AWS's coding assistant. Strong if you live in the AWS ecosystem.
- Mistral Le Chat — European-based, GDPR-native AI. CLI tooling is early-stage but growing.
This is not an exhaustive list. New tools appear monthly. The architectural concepts in this guide (sessions, hooks, MCP, memory, skills) are Claude Code-specific, but the underlying ideas — persistent context, tool use, cross-session memory — are becoming table stakes across the industry.
The honest answer depends on your situation, not on which tool is theoretically best:
| Your situation | Use this | Why |
|---|---|---|
| Individual developer, no corporate constraints | Claude Code | Deepest architecture, best long-term investment |
| Your org is deep in GitHub / Azure / Microsoft | Claude Code | Still the best coding tool — ecosystem fit is a secondary concern |
| Your org is on Google Cloud | Claude Code | Platform doesn't change which tool codes best |
| Code cannot leave your network (compliance, classified) | OpenCode + Ollama | Fully local, no cloud API calls |
| IT won't approve any cloud AI tool | Ollama + Aider | Open source, fully self-hosted, no vendor |
| You want the simplest possible setup | Claude Code | One install, everything works out of the box |
| Your company already has a standard | Whatever they chose | Consistency beats optimization. Learn the architecture here, apply it there. |
This trips people up constantly. Claude Code is the CLI application. The Claude models (Opus, Sonnet, Haiku) are the AI brains that run inside it. They are separate. You can switch models mid-session with /model without restarting anything.
Think of it this way: Claude Code is like Visual Studio. The Claude model is like the C# compiler version. You can change the compiler version without reinstalling Visual Studio.
| Model | Coding ability | Speed | Cost | Use for |
|---|---|---|---|---|
| Claude Opus 4.6 | Excellent — deep reasoning | Slow | Highest | Complex architecture, hard bugs, design decisions |
| Claude Sonnet 4.6 | Very good — well balanced | Fast | Medium | Daily coding work — the recommended default |
| Claude Haiku 4.5 | Adequate for simple tasks | Fastest | Lowest | Quick edits, boilerplate, simple lookups — NOT complex coding |
The same separation applies to other CLIs: Copilot CLI lets you choose between Claude Haiku, GPT-4.1, and GPT-5 Mini. OpenCode can point at any model via Ollama or an API. The CLI is the tool; the model is the engine. You can swap engines.
Editorial teaching model — not benchmark scores. IQ numbers are estimated shorthand for relative reasoning quality (Opus 4.6 = 100 baseline), not official measurements. Awareness numbers are calculated from published token limits and are factual. ■ IQ = editorial estimate ■ Awareness = official fact
People talk about AI quality as if it's one thing. It isn't. There are two independent dimensions:
- IQ — The model's reasoning capability. How well it thinks, writes code, solves problems. Opus > Sonnet > Haiku. GPT-5 vs GPT-4.1. This is what most people compare.
- Awareness — How much information the AI can hold in mind at once. If IQ is how smart it is, Awareness is how much of the room it can see. A genius who can only read one page at a time is limited. A less brilliant person who can read the whole book at once may outperform them.
Think of it like image fidelity. A single photograph can be ultra-high-resolution — every pixel tiny. That's high IQ. A movie covers far more ground with thousands of frames, but each individual frame may be lower resolution than a still. That's high Awareness. A brilliant photograph of one page of your codebase is less useful than a full movie of the entire project, even if each movie frame is slightly softer.
We measure Awareness in pages — one standard 8.5×11 page of dense text ≈ 500 words ≈ 750 tokens. This gives you an intuitive sense of how much the AI can "see" at once. More Awareness is most valuable on large codebases, long sessions, and cross-file reasoning. For small, narrow tasks — renaming a variable, explaining a single function — extra Awareness may matter less than reasoning quality (IQ).
| Platform / Plan | Model | Context Window (tokens) | Awareness (pages) | Notes |
|---|---|---|---|---|
| Claude Code — Free / Pro | Opus, Sonnet, Haiku | 200K tokens | ~267 pages | Standard. Pro users can opt into 1M by typing /extra-usage |
| Claude Code — Max, Team, Enterprise | Opus 4.6, Sonnet 4.6 | 1M tokens | ~1,333 pages | Automatic 1M context on Max/Team/Enterprise as of March 2026 |
| Claude Code — Enterprise (extended) | Sonnet 4.6 | Up to 1M+ | ~1,333+ pages | Enterprise negotiates custom context. Verify with Anthropic sales. |
| Codex CLI / ChatGPT Plus | GPT-5 | ~272K tokens | ~363 pages | Solid but well below Claude Max |
| Copilot CLI | Claude Haiku, GPT-4.1, GPT-5 Mini | Varies by model selected | ~170–267 pages | Copilot may cap context below the model's native maximum |
| Gemini CLI — Free / Pro | Gemini 2.5 Flash, 2.5 Pro | 1M tokens | ~1,333 pages | Gemini has exceptional context by default — largest in the comparison |
| Gemini CLI — Ultra (coming) | Gemini 2.5 Pro (2M) | 2M tokens | ~2,667 pages | 2M context version in development — would be the most aware tool available |
| OpenCode + Ollama (local) | Llama, Mistral, DeepSeek, etc. | 4K–128K tokens (varies by model) | 5–170 pages | Local models have much smaller context windows. Major limitation vs cloud models. |
GitHub Copilot CLI offers Claude Haiku 4.5 as one of its model choices. So if both Claude Code and Copilot can run Claude Haiku — are they the same tool?
No. Same brain, completely different truck.
When Copilot routes to Claude Haiku, you get Haiku's reasoning ability. That part is identical. But Copilot does not give Claude:
- CLAUDE.md — persistent project instructions loaded every session
- Skills — specialized instruction documents loaded on demand
- Hooks — automated commands triggered by events
- Memory — notes that persist across sessions
- Agents/Worktrees — delegating subtasks to isolated environments
- Plan Mode — design-before-coding workflow
- The full context window Claude Code provides to the model
The model is the IQ. The platform wrapped around it determines the Awareness and the workflow. A brilliant person handed a blank notepad is less effective than a capable person with a full desk, reference library, filing cabinet, and an organized team of assistants.
Why Copilot offers Claude Haiku at all: Anthropic licenses the Claude models via API. Microsoft/GitHub pays to route calls through Anthropic's API and bundle it into Copilot. Same model, different packaging, different surrounding toolset, different pricing, different context limits. The model is a commodity; the tooling around it is where the differentiation happens.
Claude Code has the richest extension architecture by far — skills, hooks, slash commands, MCP servers, agents, memory, and plan mode are all unique to it. The other CLIs are primarily prompt-and-respond tools with session persistence. If you're investing time learning AI coding tool architecture, Claude Code is where that investment pays off the most.
That said, all five CLIs can edit files, run commands, and resume sessions. For basic coding tasks, the experience is similar across all of them. The differences emerge when you want automation (hooks), extensibility (MCP), reusable workflows (skills/commands), and cross-session knowledge (memory).
Glossary
Every tool, acronym, and term used in this guide — defined in one place. Anthropic’s confusing naming is called out inline where it applies.
| Term | Definition |
|---|---|
| Aider | An open-source, Python-based AI coding assistant that runs in the terminal. Works with multiple AI models. Predates Claude Code and has a loyal following among developers who prefer open-source tools.→ Other AI CLIs |
| acli / acli.exe | The Atlassian CLI — a command-line tool for interacting with JIRA and other Atlassian products. More token-efficient than calling the JIRA REST API directly. Claude Code plugins use it to look up tickets, transition statuses, assign issues, and update custom fields. The .exe version is the Windows executable. Configured once with your server URL, email, and API token via acli.exe jira --server ... --save-config. → Building a Plugin |
| Agent / Subagent | A separate Claude instance spawned to handle a subtask. Runs in its own context, reports results back. Like a child process. |
| Allowlist | A list of pre-approved tools/commands in settings.json. Claude won't ask permission for allowlisted actions.→ Chapter 15.5 |
| API | Application Programming Interface. A defined way for programs to talk to each other. Think of it as a drive-through window: you send a request in a standard format, the system processes it, and sends back a response. Claude's API lets software applications send prompts to Claude and receive responses — programmatically, without a human typing.→ Pricing page |
| API Usage | Tokens consumed — described from the billing side. Anthropic says “API usage billed separately” to mean you pay per token on top of any seat fee. Not a separate concept from tokens — it IS tokens, counted and charged.→ Pricing page |
apt | Package manager for Debian/Ubuntu Linux. apt install nodejs installs Node.js. |
| Architecture Decision Record (ADR) | A short document that records a significant architectural decision: what was decided, why, what alternatives were considered, and what the consequences are. Written before the work starts, not after. The software equivalent of writing down why you made a structural choice before pouring the concrete. Plan Mode in Claude Code is the same discipline — describe your approach before touching files.→ Chapter IX (Plans) |
| Audit log | A record of who did what and when. In enterprise Claude, every prompt and response is logged. Legal and compliance teams use audit logs to investigate incidents or demonstrate regulatory compliance.→ Pricing page |
| Auto memory | A Claude Code feature where Claude automatically writes memory notes about things it discovers, without being asked. These notes persist across sessions.→ Chapter X |
| Autoregressive generation | How language models generate output: one token at a time, each token depending on all previous tokens. Cannot be parallelized. This is why output tokens cost 5x more than input tokens — each requires a full forward pass through the entire model. |
| Awareness (AI) | How much information an AI can hold in mind at once. Measured in pages (1 page ≈ 500 words ≈ 750 tokens). A 200K token model has ~267 pages of Awareness. A 1M token model has ~1,333 pages. Distinct from IQ — a smart model with low Awareness misses things it simply cannot see.→ IQ/Awareness chart |
bash | A command-line shell (the program that interprets your typed commands). Default on Mac/Linux. Available on Windows via WSL or Git Bash. Claude Code's Bash tool runs commands through this. |
| BOM | Byte Order Mark. An invisible character at the start of a file that declares its encoding. Some tools (like Gemini CLI) reject files with BOMs. |
| Boilerplate | Repetitive, standard code that's needed but not interesting. Templates, imports, config setup. Good candidate for Haiku. |
| Branch | A parallel version of your code where you can try changes without affecting the main version. Like making a copy of a document to draft edits, then deciding whether to keep or discard the changes. main is the primary branch considered the “official” version. |
brew (Homebrew) | Package manager for macOS. brew install node installs Node.js. Mac's equivalent of apt/chocolatey. |
| Cache (Token Cache) | Claude caches repeated content (CLAUDE.md, skills) after first use. Subsequent reads cost 10x less. |
cat | Outputs file contents to the terminal. cat readme.md prints the file. |
cd | Change Directory. Navigates to a folder. cd C:\repos\MyProject moves you into that directory. |
| CDN | Content Delivery Network. Servers that cache content close to users for speed. Analogous to how Claude's token cache reduces repeated input costs. |
| Certificate Authority (CA) | An organization that issues digital certificates proving a server is who it claims to be. Corporate networks sometimes use their own internal CA, which can cause CLI tools to reject HTTPS connections until the custom certificate is installed. |
| CI/CD | Continuous Integration / Continuous Deployment. An automated pipeline that builds, tests, and deploys code every time someone pushes a change. “CI/CD pipeline” = the assembly line that takes your code from commit to production automatically.→ Other AI CLIs |
| CLAUDE.md | A markdown file of instructions that Claude reads at the start of every session. Your project's configuration for AI behavior.→ Chapter III |
| CLI | Command Line Interface. A text-based way to interact with a program. You type commands, it responds with text.→ Introduction |
| Clone | Downloading a copy of a git repository to your local machine. git clone https://github.com/user/repo creates a local copy you can work with. |
| Codebase | All the source code files that make up a project or application. “The codebase” = the entire collection of code, not just one file. When Claude Code reads your codebase, it's scanning your project's files.→ Introduction |
| Commit | A saved snapshot of your code at a specific point in time. Every commit has a unique ID and a message describing what changed. Like a save point in a game — you can always go back to any previous commit. |
/compact | A Claude Code command. Manually compresses conversation history to free context window space. Lets you control when and what gets compressed.→ Chapter 15.4 |
--continue | CLI flag to resume the most recent session in the current directory. |
| Context Window | The total token capacity Claude can hold at once. Lives in GPU VRAM on Anthropic's servers (the KV cache), not your laptop. Median cost: ~0.5 MB per token. 200K tokens (Pro) ≈ 100 GB of server VRAM. 1M tokens (Max/Enterprise) ≈ 500 GB. Bigger context windows are real hardware costs, not just software features. Anthropic also calls this “extended context model” when the 1M window is enabled — it's the same model, just more Awareness turned on.→ Chapter 15 (advanced box) |
| Corporate proxy | A network intermediary in enterprise environments that routes and inspects outbound internet traffic. Can require special configuration for CLI tools that make HTTPS requests.→ Pricing page |
/cost | A Claude Code command. Shows token usage and estimated dollar cost for the current session.→ Chapter 15.8 |
curl | Transfers data from/to URLs via command line. Used to test APIs: curl https://api.example.com/users. |
| Cursor | An AI-integrated code editor built as a fork of VS Code. Has AI coding assistance built directly into the editor rather than as a CLI tool. Competes with Claude Code for the same “AI-assisted coding” workflow.→ Other AI CLIs |
| Dependency | A package/library your project requires. Managed by npm (JavaScript), pip (Python), NuGet (.NET). |
| DI | Dependency Injection. A design pattern where objects receive their dependencies rather than creating them. Part of SOLID. Overkill in scripts. |
| DLP (Data Loss Prevention) | Software that monitors and blocks sensitive data from leaving a corporate network. Can interfere with CLI tools that make outbound HTTPS requests, even legitimate ones like Claude Code.→ Pricing page |
| .editorconfig | A configuration file that tells code editors how to format code: tab size, line endings, indentation style. Works across different editors so the whole team uses the same formatting rules automatically. |
| Environment Variable | A named value set in the operating system, accessible by programs. Used for credentials: DATABASE_URL=postgres://... |
| Extended context model | Anthropic's term for the same Opus or Sonnet model with the 1M context window enabled instead of 200K. Not a different AI. Not a new model. Just more Awareness turned on. Plain term: 1M Awareness (available on Max/Team/Enterprise). |
| Extended Thinking | Internal reasoning Claude does before responding. Generates invisible “thinking” tokens. Costs more but produces better answers for complex problems.→ Chapter 15.10 |
| Frontmatter | A metadata block at the top of a markdown file, delimited by --- lines. Contains key-value pairs like name: and description:. Used in Claude Code skills and slash commands so the system can discover and describe them. Written in YAML format. → Making a Plugin |
| Forward pass | One complete computation through all layers of a neural network. Each output token requires one full forward pass. For a model with 80+ layers, this is expensive — hence output tokens cost 5x input tokens. |
| git | The version control system that tracks changes to files over time. It lets you save snapshots (commits), try experiments on separate branches, and go back to any previous state. git itself is a command-line tool. Products that provide git hosting and collaboration: GitHub (most popular, owned by Microsoft), GitLab (popular in enterprises, can be self-hosted), Bitbucket (popular with Atlassian/Jira shops), Azure DevOps (Microsoft's enterprise git platform), and Gitea/Forgejo (self-hosted open source options). GitHub is the public face of git for most developers.→ Glossary |
| GitHub | A website (github.com) where developers store and share git repositories. Think of it as Google Drive for code, with version history and collaboration tools built in. Free for public projects. Owned by Microsoft.→ Other AI CLIs |
| .gitignore | A text file in a git repository that lists files and folders git should ignore — never track, never commit. Put credentials, build output, and personal settings here. Like a “do not touch” list for git. |
| glob | Global (pattern). A file-matching pattern. *.js matches all JavaScript files. src/**/*.ts matches all TypeScript files in src/ recursively. |
| Governance (features) | Controls that let an organization manage, monitor, and restrict how employees use a tool. In Claude Enterprise: audit logs (who sent what), SSO (employees log in with company accounts), RBAC (which employees can use which features), SCIM (auto-add/remove users when they join or leave). None of it makes Claude smarter — it makes IT and legal teams happy.→ Pricing page |
| GQA (Grouped Query Attention) | A modern AI architecture technique where the model uses far fewer KV heads than total attention heads. Instead of one Key-Value pair per attention head, multiple heads share a single KV head. A model with 64 total heads might have only 8 KV heads — reducing KV cache size by 8x. The main reason modern models are far more memory-efficient than older ones. Claude almost certainly uses GQA. |
grep | Searches for text patterns in files. grep "function" *.js finds all lines containing “function” in JavaScript files. Claude Code has a built-in Grep tool. |
| Group Policy | A Windows feature in corporate environments that lets IT centrally control what employees can install and run on their computers. If Group Policy is restricting your machine, you may need IT to whitelist Claude Code before you can install it.→ Pricing page |
| gRPC | Google Remote Procedure Call. A high-performance RPC protocol using Protocol Buffers. MCP does NOT use gRPC — it uses JSON-RPC. |
| GUI | Graphical User Interface. A visual interface with windows, buttons, and mouse interaction. What most people think of as “an application.”→ Introduction |
| Haiku (Claude Haiku) | Claude's fastest and cheapest model. Fine for simple edits, quick lookups, and boilerplate. Not recommended for complex coding — it will struggle and produce lower-quality results.→ IQ/Awareness chart |
| HBM (High Bandwidth Memory) | The memory type used in AI GPUs (H100, H200). Stacked directly on the GPU for maximum bandwidth (~3.35 TB/s on H100). Essential because the KV cache must be read on every forward pass — bandwidth is the bottleneck, not compute. |
| Headless | Running software without any visual interface — no windows, no mouse, no screen. Servers run headless. Claude Code can run headless in automated pipelines, executing tasks without a human watching.→ Other AI CLIs |
/help | A Claude Code command. Lists all available commands, built-in tools, and current session information.→ Chapter XV |
| HIPAA | Health Insurance Portability and Accountability Act. US federal law regulating how health data must be protected. If your organization handles patient data, you need HIPAA-compliant tools — meaning the vendor must sign a Business Associate Agreement and meet specific security requirements.→ Pricing page |
| Hook | A command or program that executes automatically in response to a Claude Code event (before/after tool use, session start, session stop). Can be any executable — a bash script, a PowerShell script, a Python script, or a compiled binary. Unlike skills (which are instructions Claude reads), hooks actually run code.→ Chapter VII |
| HTTP / HTTPS | HyperText Transfer Protocol (Secure). The protocol web browsers use. HTTPS adds encryption (TLS). APIs, web pages, and remote MCP servers use HTTPS. |
| IDE | Integrated Development Environment. A code editor with built-in tools (debugger, compiler, etc.). Visual Studio, VS Code, JetBrains products.→ Introduction |
/init | A Claude Code command. Analyzes your project and generates an initial CLAUDE.md automatically. Run it when starting a new project, or to improve an existing CLAUDE.md.→ Chapter III |
| IQ (AI) | The AI model's reasoning capability — how well it thinks, writes code, solves hard problems. Opus has higher IQ than Haiku. Distinct from Awareness (context window size). Both matter; they measure different things.→ IQ/Awareness chart |
| JetBrains | A software company that makes professional IDEs: IntelliJ IDEA (Java), Rider (.NET/C#), PyCharm (Python), WebStorm (JavaScript), and others. Known for smart code completion and refactoring tools.→ Other AI CLIs |
| JIRA | A project management and issue tracking tool by Atlassian, widely used in software teams. Developers track bugs, stories, and tasks as "tickets" (e.g., PROJ-12345). Has a REST API and CLI tools (acli) for automation. Claude Code plugins can integrate with JIRA to look up tickets, transition statuses, and update fields. → Making a Plugin |
| JSON | JavaScript Object Notation. A data format: {"name": "Steve", "age": 42}. Used everywhere for configuration files and API communication. |
| JSON-RPC | JSON Remote Procedure Call. A protocol for calling functions on another process using JSON messages. How MCP servers communicate with Claude. |
| .jsonl (JSON Lines) | A file format where each line is a separate, complete JSON object. Claude Code stores session history in .jsonl files — one event per line, making them easy to read one record at a time without loading the whole file. |
| JWT | JSON Web Token. A compact token format for authentication. Sometimes used by MCP servers or APIs for auth. |
| keytar | A Node.js library for securely storing credentials in your operating system's native keychain (Windows Credential Manager, macOS Keychain, Linux Secret Service). Used by apps to store API tokens without exposing them in plain text files. → Making a Plugin |
| KV | Key-Value. In AI: the “Key-Value” cache — a GPU memory structure that stores attention computations for every token in the context window. The K and V stand for Key and Value vectors — internal mathematical representations the model needs to “remember” what it has read. For every token, at every model layer, two vectors are stored. This is what the context window physically occupies on Anthropic's servers.→ Chapter 15 (Context Window) |
| KV Cache | The GPU memory structure where every token's attention data is stored during a conversation. Grows linearly with context size. Rough estimate: ~64 KB to ~0.5 MB per token depending on architecture (modern GQA models at the low end, older full multi-head attention models toward the high end). 200K tokens ≈ 100 GB. 1M tokens ≈ 500 GB. Bigger context windows are not a software feature — they reflect real infrastructure cost.→ Chapter 15 (Context Window) |
| KV heads | The number of Key-Value head pairs in a model. In full multi-head attention: KV heads = total heads. In GQA models: KV heads is much smaller (e.g. 8 vs 64 total). KV heads directly determine KV cache size per token. The correct KV cache formula is: 2 × layers × KV_heads × head_dim × bytes — not total heads. |
| Latency | The delay between sending a request and receiving the first response. In AI tools, this usually means the time between submitting your message and seeing the first word appear — sometimes called time-to-first-token (TTFT). A longer context window increases latency because the model must process all prior tokens before it can begin generating output.→ Context Window chapter |
| Lint / Linter | A tool that checks code for style violations, potential bugs, and anti-patterns. ESLint (JavaScript), PSScriptAnalyzer (PowerShell). |
| LLM | Large Language Model. The AI model that powers Claude, GPT, Gemini, etc. Trained on massive text datasets to understand and generate language and code. |
ls | Lists files in a directory. The Unix equivalent of dir on Windows. |
| LSP | Language Server Protocol. A protocol for code intelligence (autocomplete, go-to-definition). Powers VS Code's IntelliSense. MCP is modeled after LSP. |
| Markdown (.md) | A lightweight text formatting language. You write plain text with simple symbols (* for bold, # for headings) and it renders as formatted text. CLAUDE.md, README.md, and SKILL.md files are all Markdown. GitHub displays .md files as formatted pages. |
| Marketplace (Plugin) | A distribution system for Claude Code plugins. A marketplace is a GitHub repository that hosts plugin packages. Users register it with /plugin marketplace add org/repo and then install plugins from it. → Making a Plugin |
| MCP | Model Context Protocol. Anthropic's protocol for connecting external tools (plugins) to Claude. JSON-RPC over stdin/stdout.→ Chapter XI |
| MCP Server (Plugin) | An external process that gives Claude new tools (database queries, API calls). The closest thing to a plugin system.→ Chapter XI |
| Memory | Persistent notes in ~/.claude/projects/*/memory/. Survives across sessions. Like a developer's project wiki.→ Chapter X |
| Merge | Combining changes from one branch into another. Like accepting tracked changes in a Word document. |
| Meta-skill | A regular skill (SKILL.md file) whose special job is listing all other available skills in a plugin and telling Claude when to use each one. Auto-injected at session start via a hook. Despite the "meta-" prefix, it's structurally identical to any other skill — just a table of contents for the plugin. → Making a Plugin |
mkdir | Make Directory. Creates a new folder. |
/model | Switches between AI models (Opus, Sonnet, Haiku) mid-session without losing conversation history. |
| Monorepo | A single repository containing multiple projects or teams' code. Requires careful configuration (e.g., subdirectory CLAUDE.md files). |
| MQA (Multi-Query Attention) | The extreme version of GQA where all attention heads share exactly one KV head. Maximum memory efficiency. Slight quality trade-off at very long contexts. Both GQA and MQA exist specifically to shrink the KV cache and make long-context inference affordable. |
node / Node.js | JavaScript runtime that runs outside the browser. Required by npm. Most AI CLI tools are Node.js applications. |
npm | Node Package Manager. Installs JavaScript packages from the npm registry. Used to install Claude Code, Codex, Copilot, Gemini CLIs. Like NuGet for the JavaScript world. |
npx | Runs an npm package without permanently installing it. npx -y @example/mcp-server downloads and runs the package in one step. Like dotnet tool run. |
| NVLink | Nvidia's GPU-to-GPU interconnect (~900 GB/s on H100s). Required when a session's KV cache exceeds one GPU's VRAM. Allows 12–25 GPUs to share a single massive KV cache. This interconnect advantage, combined with CUDA, is why Nvidia dominates AI infrastructure. |
| Opus (Claude Opus) | Claude's most capable AI model. Best for complex reasoning, architecture decisions, and hard bugs. Slowest and most expensive. Use it when the problem actually needs deep thinking.→ IQ/Awareness chart |
| OSS | Open Source Software. Software whose source code is publicly available. OpenCode is OSS. Claude Code is not. |
| Personal Access Token (PAT) | A security credential string you generate on a site like GitHub to prove your identity to an API. Starts with ghp_ on GitHub. Has nothing to do with AI text tokens — a PAT is an auth secret, not a billing unit or a unit of text.→ Confusing Terms |
| Plugin (Claude Code) | A program installed into Claude Code that provides slash commands, skills, hooks, and/or agents. Lives on your local disk at ~/.claude/plugins/cache/. Dormant between sessions; wakes up via hooks at session start. Not the same as an MCP server — plugins provide workflows and methodologies; MCP servers expose API tools. → Making a Plugin |
| Plan / Plan Mode | A Claude Code mode activated with Shift+Tab. Claude describes its intended approach before making any changes. You review and approve the plan first. Same discipline as an Architecture Decision Record — describe your approach before touching files.→ Chapter IX |
| PowerShell | Microsoft's command-line shell and scripting language for Windows. More powerful than the old Command Prompt. The recommended terminal for Windows developers. Like bash, but for Windows. |
| Prefill phase | The fast phase where all input tokens are processed in parallel as one batch. What you pay input token prices for ($3/M on Sonnet). Contrast with the generation phase (output tokens), which is slow and sequential. |
| Premium requests | Copilot's term for using more capable AI models. Not a Claude term — but you'll see it when comparing platforms. Plain term: higher-IQ model requests. |
| Pull Request (PR) | A request to merge your branch into main. Other team members can review the changes before they're accepted. Despite the name, it's actually about pushing your changes for review, not pulling someone else's. |
| Push / Pull | Push = upload your local commits to the remote server (GitHub). Pull = download other people's commits from the server to your machine. |
python / pip | Python programming language and its package manager. Some MCP servers and tools (like SQLite access) use Python. pip install is Python's version of npm install. |
| QR Code | Quick Response Code. A 2D barcode scannable by phone cameras. Used by Remote Control to quickly connect your phone to a session. |
| RBAC (Role-Based Access Control) | A security model where what you can do is determined by your role (admin, developer, viewer, etc.) rather than by individual permissions. “Developers can use Claude Code; viewers can only use the chat.” Enterprise feature.→ Pricing page |
/rc | Short for /remote-control. Enables phone/remote access to the current session. |
| Red-green testing | Write a test that FAILS first (red), then write code that makes it PASS (green). Proves the test actually detects the problem. A core practice in the Practitioner's Guide.→ Practitioner's Guide |
| Refactor | Restructuring code without changing its behavior. Improving organization, readability, or performance. |
| regex | Regular Expression. A pattern-matching syntax for text. ^\d{3}-\d{4}$ matches phone numbers like 555-1234. Used by Claude's Grep tool. |
| Registry (in code) | A data structure that maps keys to behaviors. Avoids if-else chains. Add a new entry to support a new variant. |
| Ralph Loop | An autonomous AI agent technique where an AI coding assistant is wrapped in a bash loop that feeds its own output back into itself, running unattended until a goal is complete. Pioneered by Geoffrey Huntley. Most effective when combined with a disciplined execution prompt (RUN_PLAN.md) and strong test coverage. → How to Run a Plan |
| Remote Control | Feature that lets you control a local Claude Code session from your phone or another device via QR code or URL.→ Remote Control tab |
| Repository (repo / repos) | It is a directory. That's it. A repository is just a folder — on your machine, or on a server like GitHub — that has version history tracking turned on. Your local copy is a directory on your hard drive (e.g., C:\repos\MyProject). Repository = versioned directory. Repo = short for repository. Repos = more than one. |
| REST | Representational State Transfer. A common style for web APIs. Uses HTTP methods (GET, POST, PUT, DELETE) on URLs. Most web services expose REST APIs. |
--resume | CLI flag to resume a previous session. claude --resume shows a list; claude --resume <id> resumes a specific one. |
| Runbook | A step-by-step procedure document for completing a specific task. “When the server goes down, follow the runbook.” Skills in Claude Code are essentially runbooks — instructions Claude reads and follows when doing that type of task.→ Chapter V |
| SAML | Security Assertion Markup Language. The technical standard that SSO systems use to pass login credentials between the identity provider and applications. You'll see it mentioned alongside SSO — they go together.→ Pricing page |
| Sandbox | An isolated environment that restricts what a program can do. Prevents untrusted code from accessing files or network. |
| SCIM | System for Cross-domain Identity Management. Automatically adds users to applications when they join an organization and removes them when they leave — based on your HR system. Eliminates manual account management. Enterprise feature.→ Pricing page |
| SDK | Software Development Kit. A library/package that makes it easier to use an API or protocol. Anthropic provides MCP SDKs for TypeScript and Python. |
| Seat fee (Enterprise) | Anthropic's term for the per-person monthly charge for access to Enterprise features (SSO, audit logs, etc.). Does NOT include token consumption — that's billed on top. Plain term: Access fee (tokens extra). |
sed | Stream editor. Transforms text via commands (find/replace, delete lines). Used in scripts for automated text manipulation. |
| Session | A conversation with Claude Code. Has a unique ID, persists as a .jsonl file, can be resumed later.→ Chapter II |
| settings.json | Claude Code's configuration file. Controls permissions, MCP servers, environment variables, model preferences.→ Chapter 15.6 |
| settings.local.json | Repo-level settings override that is NOT checked into git. For personal credentials and local-only config.→ Chapter 15.6 |
| Shell | The program that interprets your typed commands. On Unix/macOS it's bash or zsh. On Windows it's Command Prompt (cmd.exe) or PowerShell. When someone says "open a shell," they mean open a terminal window. The terms "shell," "terminal," and "command line" are often used interchangeably in casual conversation, though technically the terminal is the window and the shell is the program running inside it. |
| Shell script | A text file containing commands that a command-line interpreter executes. On Unix/macOS, these are bash or sh scripts (no file extension or .sh). On Windows, the equivalent is a batch file (.bat or .cmd) or a PowerShell script (.ps1). Claude Code plugins use shell scripts for hooks. Cross-platform plugins often use a polyglot wrapper — a single file that works as both a batch file and a bash script. → Building a Plugin |
| Skill | A SKILL.md instruction file in .claude/skills/. Loaded into context when relevant. Not a running process — a document that informs behavior.→ Chapter V |
| Slash Command | A custom command defined as a file in .claude/commands/. Typing /foo loads the prompt from that file.→ Chapter VI |
| SOLID | Single responsibility, Open/closed, Liskov substitution, Interface segregation, Dependency inversion. Five design principles for maintainable object-oriented code. Useful guidelines, not commandments.→ Practitioner's Guide |
| Sonnet (Claude Sonnet) | Claude's balanced AI model. Good speed, good capability. The recommended default for daily coding work. Better value than Opus for most tasks.→ IQ/Awareness chart |
| SOP (Standard Operating Procedure) | A documented set of instructions for performing a routine task consistently. Similar to a runbook. When you write a Skill for Claude, you're writing an SOP for it to follow.→ Chapter V |
sqlite / SQLite | A lightweight database stored as a single file. Codex and OpenCode store session data in SQLite databases. No server needed. |
| SSH | Secure Shell. Lets you connect to a remote machine's command line over an encrypted connection. How you'd access Claude Code running on a remote server. Also the ssh command-line tool. |
| SSO (Single Sign-On) | A system where employees log into one central identity provider (like Microsoft Active Directory or Okta) and that login grants access to all approved applications. No separate username/password for each app. Enterprise feature.→ Pricing page |
| Static Analysis | Examining code without running it. Linters, type checkers, and pattern scanners are static analysis tools. |
| stdin / stdout | Standard Input / Standard Output. The default input and output streams of a process. MCP servers communicate by reading stdin and writing stdout. |
sudo | Runs a command as administrator (root) on Unix/Mac. Like “Run as Administrator” on Windows. |
| Task Tool | Built-in todo/task tracker within a Claude Code conversation. Creates, tracks, and completes tasks.→ Chapter 15.2 |
| TLS | Transport Layer Security. Encryption for network traffic. The “S” in HTTPS. Remote Control uses TLS to secure messages between your phone and machine. |
tmux | Terminal multiplexer. Lets you run a CLI session, detach from it, and reattach later — even from a different machine. Like keeping a program running after you close the window. |
| TOML | Tom's Obvious Minimal Language. Another config file format. Used by Codex for configuration (config.toml). |
| Token | The word token has four different meanings in AI discussions, and only three of them are related: 1. (text unit) A chunk of text the model reads and writes. In English, a rough rule of thumb is about 4 characters or ¾ of a word, though the real number varies by language and formatting. 100 words ≈ about 130 tokens as a rough estimate. 2. (billing unit) The unit AI companies charge for. Input and output tokens are often priced separately, because processing them can impose different costs. On Sonnet: $3/MTok input, $15/MTok output. 3. (inference / KV-cache position) During generation, each token in the active context contributes to memory use inside the model’s attention machinery, commonly described in terms of the KV cache. The memory cost per token depends on model architecture — a rough estimate is ~64 KB to ~0.5 MB per token. This is one reason large context windows are expensive to serve. The raw text itself is tiny; what matters is the model’s internal mathematical representation of that text. 4. (security credential) A string used to prove identity to an API — a Personal Access Token (PAT), API key, or JWT. This meaning is completely separate from the other three.→ Pricing page |
| UAC | User Account Control. Windows security feature that prompts before elevated actions. Claude Code's permission model works similarly.→ Chapter 15.5 |
| URI / URL | Uniform Resource Identifier / Locator. A web address. https://github.com/srives/AIManifesto is a URL. |
| TypeScript | A superset of JavaScript that adds static type checking. Files end in .ts instead of .js. Catches type errors at compile time rather than at runtime. Used heavily in modern web development and many Claude Code plugins. Like JavaScript with guardrails. |
| Usage (Anthropic term) | Anthropic's term with two unrelated meanings depending on context: (a) how many messages you've sent in the rolling 5-hour window (subscription limits), OR (b) tokens consumed through the API. Plain terms: Messages (subscription) or Tokens (API). |
| Usage limits (Anthropic term) | Anthropic's term for how many messages you can send in a 5-hour rolling window before Claude slows down or stops. Separate from token pricing. Plain term: Message budget. |
| UTF-8 | Unicode Transformation Format (8-bit). The standard text encoding. Supports all languages and emoji. What your files should be saved as. |
| UUID | Universally Unique Identifier. A random ID like dd8ab84a-a52e-467b-bcb6-0ad3a44a5db6. Used for session IDs. Practically guaranteed to be unique. |
| Visual Studio | Microsoft's full-featured IDE for Windows. Used primarily for C#, C++, and .NET development. Not the same as VS Code — Visual Studio is the big one, VS Code is the lightweight one.→ Other AI CLIs |
| VS Code | Visual Studio Code. A free, lightweight code editor by Microsoft. Works on Windows, Mac, Linux. Popular for almost every language. Not the same as Visual Studio — it's smaller and more general-purpose.→ Other AI CLIs |
| WinGet | Windows Package Manager, built into Windows 10/11. Install software with winget install PackageName. Like apt (Linux) or Homebrew (Mac), but for Windows. Use winget install Anthropic.ClaudeCode to install Claude Code.→ Introduction |
| Working tree | Git's name for the folder where your actual files live — the checked-out copy you edit. When you clone a repo, the folder you get is the working tree. Distinct from the hidden .git folder, which stores git's internal history and objects. You work in the working tree; git manages the .git folder. |
| Worktree | A git feature that lets you check out the same repository into multiple folders simultaneously, each on its own branch. Normally one repo = one folder = one branch at a time. A worktree creates a second (or third) folder from the same repo — same git history, different branch, different files on disk, all live at once. Claude Code uses this to give subagents their own isolated folder to work in, so the agent's changes never touch your files until you choose to merge them.→ Chapter 15.1 |
| WSL | Windows Subsystem for Linux. Runs a real Linux environment inside Windows. Many CLI tools work better under WSL. Available via the wsl command. |
| YAML | YAML Ain't Markup Language. A human-readable data format (like JSON but with indentation instead of braces). Used by Copilot for session config. |
Case Studies
Real projects, real mistakes, real lessons. Each case study documents what actually happened when AI coding tools were used in production — what worked, what collapsed, and what was learned.
Engineering with AI: From Concept to Product
Building from the ground up with AI in mind — and documenting every step of the way
The Idea
I want to use the occasion of a new product as a case study for doing AI-engineering — as opposed to AI-coding (documenting my steps along the way). The product: a tool that lets you draw your UI on a drawing tablet — screens, block diagrams, workflow sketches — and then feed those drawings directly into an AI so it can generate screen layout designs. The specific use case was for adding screens to an existing product, but would be nice to draw screens for a new product from scratch with hand-drawn designs as the AI's input.
The insight that makes this worth a case study is not the product itself. It is the discipline of how it was started. Everything in this story was meant to be done before a single line of code was written (excepting any POC work for proving the technological capabilities).
Pitfall Avoided: Turning Concept Into Product Too Fast
AI could already have a working product built by the time you finish writing this document. That is the trap. You double the time to the first exciting demo, and that takes willpower to accept. But it pays off in every subsequent week.
Vibe coding is a strong pull. It is a dopamine hit. It makes you want to have something running tonight. The pattern in Case Study 1 is exactly this — a Saturday morning of pure velocity that built up a debt that took many change-sets to pay down.
AI Virtue #1: Patience
Patience is not waiting. It is choosing a slower path now to walk a faster path later. The steps below are all "slow" in the sense that none of them produce a running product. They all produce something more durable: a shared understanding between you and the AI of exactly what you are building and why.
The Process, Step by Step
1. GitHub Repo — Early
Don't talk about it. Do it. Before architecture docs, before POC, before any serious thinking — create the repo. Every decision made after this point is recorded somewhere. This is not optional.
2. Narrate the Idea in a Document
Write down what the product is. Not code — prose. Describe it the way you would describe it to a colleague over lunch. In this case: "I want to draw screens on my drawing tablet, photograph or export them, and feed those drawings into Claude so it can produce screen layout designs I can start coding from." That is the idea. Write it down before anything else.
This step is the design process. The act of writing forces you to make decisions you would otherwise defer. Vague ideas become concrete questions. Concrete questions reveal scope. Scope reveals what you actually need to build.
3. POC — Small Vibe Coding to Prove the Idea Works
At this stage, and only at this stage, vibe coding is appropriate. Write the smallest possible amount of code to answer one question: does this actually work? In this case: can Claude receive a tablet drawing and produce a useful screen layout? If yes, proceed. If no, pivot now before you have written a design document for a product that does not work.
The POC is not production code. It is a disposable experiment. Treat it that way.
4. Draw Your Screens and Block Diagrams
Before writing a single technical requirement, draw every screen. This discipline comes from a long-standing principle: designing from the screens drives every other decision. If the screen design is iterative, you will discover interfaces. You will find data that needs to move between screens. You will find menus and flows you did not know you needed.
Draw on paper if you have to. Then photograph or export and feed them into the AI. You are not just documenting — you are doing design work in the medium that AI can consume.
5. Write the Full Technical Design Document
This step is critical. Document every layer of the system:
- How many types of menus exist? What is in each one? Can they share a generic menu system?
- How many screens are there? What data does each screen take as input and produce as output?
- What is the physical path from tablet drawing to AI consumption? Every hop documented.
Writing this document is not overhead. It is architecture. Everything that would otherwise be decided ad-hoc during coding is decided here, where decisions are cheap and reversible.
6. Multiple AIs Consume the Design and Compete on Architecture
Feed the design document to more than one AI and ask each for an architecture plan. Then have them go back and forth until you reach a consensus. This is competitive planning: two AIs producing proposals, critiquing each other's work, and converging on something better than either would have produced alone.
A practical note: ask each AI which of the two is better suited to implement a given component, and which is better suited to review it. They will give you honest answers, and those answers are useful.
The tool for this is RUN_PLAN.md — an execution-control prompt you place in your repository alongside the plan. It forces the AI through a mandatory preflight before touching a single file: restate the task, read the actual source (not summaries), find every caller of every function it will change, map the blast radius, identify the test surface, and build a risk map. Only then does it implement — one bounded change at a time — followed by targeted tests, a build, and a hostile self-review. It is the difference between an AI that codes from the plan document and one that codes from the codebase.
PLAN_FINAL.md, executing with RUN_PLAN.md, and Ralph Loops for unattended runs — is documented in one place.
How to Run a Plan →
6b. Extract and Approve Constraints Before Code
Before the AI touches a single line of code, there is a critical gate: constraint extraction and approval.
Have the AI write a .constraint-extraction.md file that extracts every architectural constraint, decision, boundary, and assumption from the final plan. This file is not code. It is the explicit statement of "what must be true for this code to be correct."
You review this file. You look for gaps, missing cases, forgotten boundaries. You ask questions. You push back if constraints are unclear. Only when this file is complete and correct do you approve it explicitly.
Only after approval does code begin. If code appears without approved constraint extraction, the commit is rejected and pointed back to the constraint file. This is not optional.
Why does this matter? Architectural work fails when constraints are implicit. They get embedded in code, discovered in review, and require rework. Making them explicit first prevents the entire class of architectural rework. Constraints extracted and approved before code almost never surface as bugs later.
7. Bootstrap from Your Prior Work
Have the AI consume your CLAUDE.md and AGENTS.md from previous projects, and update them to become this project's founding documents. This is a critical step. It takes your engineering discoveries from prior work — every rule that came from something that actually broke — and puts this project on a good footing before a line of code is written.
You are not starting from scratch. You are transferring institutional knowledge.
CLAUDE.md or AGENTS.md. The Day-1 CLAUDE.md / AGENTS.md Builder →8. AI Designs the Directory Structure
Before any code moves anywhere, have the AI propose a directory structure for both docs and code. Then have a second AI critique it. Directory structure is not a detail — it determines how the project grows. A poor structure that gets committed early becomes a migration cost later.
9. Design for Scale: Local vs. Cloud
Have the AI model both paths: a local-server architecture and a cloud-hosted architecture. This is not about picking one today. It is about making sure the design is not accidentally incompatible with the path you will want later. Decisions made here are cheap. The same decisions made at v5 cost retrofits.
10. Move POC Code Into the Proper Structure
Now, and only now, the POC code moves from its throwaway location into the real directory structure. The AI does the move. The move is mechanical — the structure was designed in step 8. Nothing is rewritten yet. The goal is to have working code in the right place before any new features are added.
11. Testing Plan, Tests, and a Test Launcher — Before Features
Have both AIs come to a testing plan together. Then have them write the tests. Then have them write a test launcher so that tests run automatically with every change.
Insist on red-green testing from this point forward. Every step of the way. Every new feature must come with new tests. This is not optional. Case Study 1 documents what it costs when test infrastructure is added after the fact. Do not repeat that.
12. Implement One Small Feature, Test, Check In
Now you are ready to build. Take the smallest complete feature from the master plan. Implement it. Run the tests. Check in. Repeat.
The velocity from here feels slower than pure vibe coding. But it compounds. Features land without breaking previous features. The AI has a shared understanding of the architecture. Additions fit because the structure was designed to accept them.
Execute each feature using RUN_PLAN.md as your constraint. Write a feature-specific plan (e.g., add-user-authentication.md), then hand it to the AI with this command:
Execute add-user-authentication.md using RUN_PLAN.md as your
rule and constraint. Complete Phase 0 preflight first, then
implement one bounded change at a time. Run tests after each change.
After each feature lands, do a hostile code review to catch local bugs, and a tough review using PR_TOUGH.md to catch architectural drift:
Do a hostile review of add-user-authentication work, then a
tough architectural review using PR_TOUGH.md. Output both to
PR_AUTH_FINDINGS.md. Block merge if CRITICAL.
What This Avoids
Case Study 1 is the counter-example. Every step above corresponds to something that was skipped there and had to be paid for in v9:
| Skipped in Case Study 1 | Paid for in v9 | Done upfront here |
|---|---|---|
| No registry-driven architecture | Scattered variant logic across 7 files | Steps 6–8: AI designs architecture before code |
| No canonical helpers | Duplicated logic in every entry path | Step 8: structure designed for shared utilities |
| No atomic writes | Silent data corruption on crash | Step 5: design doc captures all I/O contracts |
| No test infrastructure | False-passing tests, latent bugs | Step 11: testing plan before first feature |
| No directory structure plan | POC code scattered permanently | Steps 8–10: structure designed and enforced first |
The One-Sentence Version
Do all the design work first, let the AI do it with you, and then build — because the AI will build exactly what you designed, and if you did not design it, the AI will invent it.
Code Review Methodology: The Hostile Review
Once your product is being built, features are landing, and tests are passing — how do you know the code is actually correct? Passing tests and successful builds are evidence, not proof. This is where a rigorous, AI-driven code review process becomes essential.
The methodology below is called a hostile code review. The word "hostile" is deliberate. You are not asking the AI to summarize what the code does. You are telling it to assume there are bugs and hunt for them. This is the difference between a review that confirms your expectations and a review that challenges them.
The Structure of a Hostile Review Prompt
A hostile review prompt has five parts, each serving a distinct purpose. Skip any one of them and the AI will fill the gap with generic advice instead of real findings.
1. Mindset. Tell the AI to assume there are bugs. Tell it not to waste time on style, naming, or refactoring unless they hide a real defect. Tell it to be skeptical of passing tests. This sets the tone for the entire review.
2. Scope. Define exactly which slice of the codebase to review. Name the feature, the files, and the boundaries. If you do not scope the review, the AI will wander into unrelated code and dilute its findings with irrelevant observations.
3. Known Incompletes. Tell the AI what is deliberately unfinished. Without this, half the findings will be "this feature is not complete" — which you already know. The rule is: do NOT report these as bugs unless the current code incorrectly pretends they are complete.
4. Trace Paths. Tell the AI exactly what to trace. A code review that reads files in isolation misses wiring bugs. The AI must follow the full path: UI event → handler → HTTP request → controller → service → data layer → response. Every hop. Every assumption.
5. Output Format. Require structured output: findings ordered by severity, each with a file/line reference, a description of the actual failure mode, and an explicit confidence level. Require a separate section for residual risks and test gaps. If there are no findings, require the AI to say exactly "No findings" — no padding, no summaries of what the code does.
Sample Prompt: Hostile Code Review
Below is a real prompt used to review a feature slice in a production codebase. It follows all five parts of the structure above. You can adapt this template to any feature in any codebase.
You are doing a hostile code review of the current [feature] slice.
Mindset:
- Assume there are bugs.
- Hunt for correctness issues, regressions, bad assumptions,
broken wiring, hidden runtime failures, invalid semantics,
missing guards, and missing tests.
- Be skeptical of passing tests and successful builds.
They are evidence, not proof.
- Do not waste time on style, naming, or refactoring
unless they hide a real defect.
- Do not review unrelated repo churn. Stay on the [feature]
slice only.
Scope:
Review only the current [feature] work:
- [component 1]
- [component 2]
- [component 3]
- tests added for this slice
Known incomplete by design (do NOT report as bugs unless
the current code incorrectly pretends they are complete):
- [known incomplete item 1]
- [known incomplete item 2]
- [known incomplete item 3]
Files to inspect at minimum:
- [path/to/file1]
- [path/to/file2]
- [path/to/file3]
Context files to compare against:
- [path/to/spec_or_design_doc]
- [path/to/sample_data]
You must manually trace the full path:
1. UI event -> handler -> HTTP request shape
2. Controller action -> service call -> response
3. Service method -> data builder -> output shape
4. Data source -> field mapping -> fallback behavior
5. Tests -> what is covered vs. what is not
You must actively look for:
- endpoint/route mismatches
- wrong HTTP verb or response type assumptions
- null-path runtime exceptions
- invalid output shape relative to spec/samples
- unsafe assumptions about data always existing
- UI errors when server returns error responses
- cases where malformed input slips through
- missing tests for real failure paths
Run these commands:
- [test command for this slice]
- [build command]
If useful, inspect relevant diffs and current file contents,
but do not modify code.
Output format:
1. Findings
- Findings first, ordered by severity.
- Each finding must include:
- severity: Critical / High / Medium / Low
- a one-line title
- exact file and line reference
- why it is a real bug/risk now
- the concrete failure mode or reproduction path
- If you are unsure, say so explicitly and explain
what evidence would confirm it.
2. Residual Risks
- Only risks that are real and current.
- Separate these from confirmed findings.
3. Test Gaps
- Only high-value missing tests for this slice.
Rules:
- If there are no findings, say exactly "No findings."
- Do not pad the answer with summaries of what the code does.
- Do not recommend future roadmap work unless it exposes
a current defect.
- Do not confuse "unfinished by design" with "broken now."
- Be strict.
Why This Works
This prompt structure works because it eliminates the three failure modes of AI code review:
- Vague scope — "review my code" produces a book report. Named files and traced paths produce findings.
- False positives from known incompletes — without the "known incomplete" section, half the findings are things you already know. This wastes your time and trains you to ignore the review.
- Unstructured output — a wall of prose buries the critical finding on page 3. Severity-ordered, file-referenced findings let you act immediately.
Architectural Code Review with PR_TOUGH.md
The hostile review above catches bugs in individual features. But code can be locally correct and architecturally wrong: canonical truth split across files, boundaries violated, caches treated as authoritative, old patterns preserved under new names.
PR_TOUGH.md is a framework for catching that class of drift. It defines nine categories of architectural violation specific to your documented architecture, and structures reviews to hunt for them systematically. Copy PR_TOUGH.md from the root of this repository into your project, customize the nine categories to match your architecture, and use it to review changes that touch core boundaries.
When to Use PR_TOUGH.md
Use PR_TOUGH.md instead of (or in addition to) hostile review when:
- Your commit touches core architectural seams — data storage, canonical authority, namespace boundaries
- Your commit modifies tests, build manifests, or docs claiming that architectural cleanup is finished
- Your AI-generated code lands and you need to verify it respects documented boundaries
- The diff is large enough to risk old architectural assumptions sneaking in under new names
Sample Prompts
For a specific feature review:
Do an architectural PR review using PR_TOUGH.md as your constraint.
Review the current state of the [feature-name] work.
Put the output in PR_TOUGH_FINDINGS.md, overwriting if it exists.
Start with findings ordered by severity.
For review immediately after code execution:
You just executed my-feature.md. Now do a tough review using PR_TOUGH.md.
Review [file/directory] for architectural drift against AGENTS.md.
Output to PR_FINDINGS.md.
If CRITICAL, surface it immediately.
For cleanup verification:
Using PR_TOUGH.md, review whether the cleanup claimed in the
commit message actually happened in the code. Check that old
patterns are really gone, not just renamed.
Output to PR_CLEANUP_VERIFICATION.md.
Falling Between the Cracks: The Engineering Terms That Will Kill You
Why boundary violations spread while algorithm bugs stay local, and the five terms that separate muddy architectures from clean ones
The Real Cost of Long-Lived Codebases
Most software pain is not "the loop was wrong" or "the API call failed." Long-lived codebase suffering is almost always:
- Code in the wrong namespace
- Broken boundaries
- Muddy contracts
- Too many entry points for the same behavior
- No clean handoff between responsibilities
Algorithm bugs are local. You fix a loop and one feature works. Boundary bugs spread. You fix canonical truth in one place and six other places still have the wrong copy. That is why the pain feels sticky.
Five Terms That Matter More Than You Think
These are not jargon for its own sake. They are labels for the failure modes that keep costing you time. Understand them and you can prevent entire categories of problems. Ignore them and AI will write code that violates each one, and you will spend months cleaning it up.
1. Namespace — Where Code Lives
Definition: The directory or module where a piece of code belongs. Namespace is about ownership and responsibility, not just file location.
Example: src/session/ owns session lifecycle. src/ui/ owns rendering. src/wt/ owns Windows Terminal integration.
The violation: UI code writing canonical session records. WT code mutating session state directly. Session code doing rendering.
Why it matters: When responsibilities leak across namespaces, you cannot change one part without breaking another. A small refactor becomes an excavation.
2. Boundary — Who May Do What
Definition: The rule about what is allowed to cross between namespaces. A boundary says "this side may call that side, but not vice versa" or "data may flow this direction, but not that one."
Example: UI may call session service. Session service must not call UI. Temporary launch data may not become canonical session state.
The violation: Session code calling back into UI. Temporary state persisting as if it were durable. A boundary that exists in docs but not in code.
Why it matters: Boundaries prevent circular dependencies and make it safe to refactor one side without touching the other. A muddy boundary means you cannot refactor either side without risking silent breakage.
3. Contract — What Is Promised at the Boundary
Definition: The explicit agreement about what will happen when you call a function or cross a boundary. Shape of data, behavior on success and failure, invariants that remain true, side effects that will not happen.
Example: Resolve-Launch returns success only after uniqueness is proven. Launching twice with the same parameters returns the same session key. Canonical records cannot contain temporary fields.
The violation: A function that sometimes writes state, sometimes doesn't, depending on context. Data shapes that change between callers. Success that does not guarantee the promised invariant actually holds.
Why it matters: A broken contract means the caller cannot trust the result. If the contract is ambiguous, tests pass but code breaks in production. If contracts are enforced everywhere, you can reason about the system.
4. Entry Point — Where a Flow Starts
Definition: The specific function or file where a behavior begins. Not just "we handle launches" but "launches begin at Start-ManagedSessionLaunch, and that is the only place where a launch can be initiated."
Example: User launching a session always goes through Start-ManagedSessionLaunch. Platform discovery always goes through Resolve-PlatformSession. Canonical records always created by New-CanonicalRecord.
The violation: Multiple functions that both launch sessions. Three different ways to discover platforms. Canonical records created inline in five places.
Why it matters: If there are multiple entry points, each one becomes a place where assumptions can diverge. One path creates records with field A, another with field B. One path validates, another does not. The more entry points, the more places the contract can break.
5. Seam — The Narrow Handoff Where Behavior Can Be Swapped
Definition: The intentionally narrow place where one responsibility hands off to another, or where behavior can be swapped out, tested in isolation, or refactored without touching anything else.
Example: "Platform-targeted launch resolution" is a seam. The overall launch flow is the same for all platforms. Each platform implements its own targeted query behind the same flow. You can swap the platform query logic without touching the flow.
The violation: The handoff between session behavior and WT transport is not cleanly isolated. Some behavior lives in the wrong place. The old code and new code coexist, and you cannot tell which path runs when.
Why it matters: A good seam lets you replace one part without corrupting others. A muddy seam means every change carries the risk of invisible breakage. Clean seams make testing easier (you can test each side independently) and refactoring safer (you know the boundary will hold).
The Damage Pattern: How These Terms Break Down
In a real codebase with real pain, the failure pattern is almost always this sequence:
UI code creates session records because it is faster than calling the session service. Boundary violation.
A launch record that was meant to be transient gets persisted. Now it exists in two places: the canonical location and the temp location. Contract violation.
A helper function that should be in namespace A gets placed in namespace B because that is where it is called. Now nobody knows who owns it. Namespace violation.
You try to fix the mess by adding a new path that does it right. But the old path is still there. Now the same behavior has two entry points. Neither is definitive. Both encode different assumptions. Entry point violation.
You want to retire the old path. But you cannot tell if other code still depends on it. The seam is too muddy. So you leave it. The mess compounds.
This is the cycle that keeps killing long-lived codebases. It is not that the code is badly written. It is that these five boundaries eroded, and now you cannot tell what is safe to change.
How AI Makes This Worse
AI violates these categories with remarkable consistency, especially when unaware of them:
- Namespace violation: "I'll put this helper right where it is called, in the UI file, because it is easier." Now the helper is in the wrong namespace and you cannot reuse it from session code without pulling UI code with it.
- Boundary violation: "UI code needs to update the session, so I'll call the write function directly from the UI handler." Now UI is writing canonical state and you cannot change the canonical storage without breaking UI.
- Contract violation: "This function sometimes returns null, sometimes a partial object, sometimes a full object. The caller will figure it out." Now the caller writes defensive code everywhere and contracts become unreadable.
- Entry point violation: "There are three places that need to launch sessions, so I'll add launch logic in each one." Now launches behave differently depending on which entry point is called, and testing one does not prove the others work.
- Seam violation: "The old implementation and the new one should coexist for now, for safety." Now nobody knows which one runs when, and you cannot retire either one without risking invisible breakage.
What To Do About It
Before AI writes code: Tell it the terms. Define your namespaces, boundaries, and contracts explicitly in CLAUDE.md. Name your entry points. Identify your seams. The clearer you are, the better the AI respects them.
During code review: Use these terms in your review. Do not just say "this looks wrong." Say "this violates the boundary between session and UI" or "this creates a second entry point that will diverge." Be specific about which term is broken.
When the AI proposes helper placement: Ask whether it belongs in the namespace where it is defined, or whether it is masquerading as something it is not.
When cleanup is claimed: Verify that old paths are actually gone, not just renamed. Use the five terms as your checklist.
Future Possibility: Better Than Faster
Today's AI behaves like a very fast implementer, a mediocre architect, and an inconsistent custodian of invariants. It is strong at local tasks: code generation, pattern matching, mechanical refactors, filling in implementation once the shape is clear. It is much weaker at the architectural work: preserving intent across many files, killing old paths instead of extending them, noticing that a "working" shortcut violates a boundary, maintaining discipline when a hack is locally convenient.
The result is useful—faster coding than humans can do by hand. But it also creates a cleanup tax: after implementation, you spend weeks catching boundary violations that were locally convenient at the time, removing old paths that the AI preserved "just in case," and enforcing invariants that the AI ignored when they would have slowed down the code generation.
What Current AI Excels At
- Local code generation within a single file or function
- Pattern matching — recognizing similar patterns in existing code and repeating them
- Mechanical refactors — moving code, renaming variables, restructuring without changing behavior
- Filling in implementation details once the shape is clear — given a function signature and docstring, producing the body
What Current AI Struggles With
- Preserving architectural intent across many files — knowing that a decision made in one namespace should prevent a certain pattern in another
- Killing old paths instead of extending them — actively removing code instead of preserving both old and new ways
- Noticing that a "working" shortcut violates a boundary — recognizing when code that passes tests is architecturally wrong
- Maintaining discipline when a hack is locally convenient — resisting the temptation to violate a boundary because it would make the current task faster
What Would Change Everything
The jump in usefulness would not come from being faster at coding. It would come from being better at architecture:
- Stronger architectural memory. The AI would hold the full model of your namespaces, boundaries, contracts, and seams in active memory throughout implementation. Not as a checklist to read once, but as a constraint that guides every decision.
- Stricter contract obedience. When a contract says "this function must guarantee uniqueness," the AI enforces it everywhere. When a boundary says "this side may not call that side," the AI refuses to cross it, even if crossing would make the code faster.
- Better detection of boundary drift. The AI notices when a shortcut being taken is actually a boundary violation in disguise. It actively surfaces these, rather than letting them pass as "working code."
- Refusal to preserve legacy paths unless explicitly told. When refactoring, the AI removes old code, not extends it. It does not keep the old way "just in case." You get a clean transition, not a muddy coexistence.
- Active protection of invariants even when a shortcut would be faster. The AI sacrifices speed for correctness. It takes the longer path that preserves an invariant, instead of the short path that violates it but lets the tests pass.
With these shifts, the cleanup tax would drop dramatically. Right now, 30–40% of the work after AI implementation is catching and fixing boundary violations, removing preserved old paths, and enforcing forgotten invariants. If AI actively protected those boundaries instead of just respecting them when told, the post-implementation work would shrink to testing, optimization, and integration. That would be a real jump in usefulness—not just faster coding, but cleaner coding that requires less cleanup.
That is not here yet. But it is the gap to watch for in future systems. When you evaluate a new AI tool, ask not "how fast can it code?" but "how well does it understand and enforce my architecture?" Speed is commoditized. Architectural discipline is still rare.
Retrospective: The Real Skill Is Not Writing Rules
It seems the problem is not adding enough guardrails and `.md` rules. The real problem is that you yourself need to know these ideas. The rules help, but only if you can recognize when the model is violating the underlying concept. Otherwise, the model can appear compliant while still drifting.
This is the hard truth: the durable skill is not "write more rules." It is being able to spot the violation yourself.
Can you see when code is in the wrong namespace? Can you spot a boundary leaking across layers? Can you tell when temporary state is being treated as canonical authority? Can you recognize when a contract has gone ambiguous? Can you tell which old paths should have died instead of being preserved?
Once you develop that eye, the documentation becomes a sharper tool instead of fragile hope. Instead of hoping the AI internalized the rules, you actively catch violations as they happen and call them by their true name:
- "No, this is dual authority. Temporary state cannot be treated as canonical."
- "No, this crosses the boundary. UI code must not mutate session records."
- "No, this scratch artifact does not belong in canonical state."
- "Delete the old path. We are not preserving both ways."
- "This violates the contract. The function promised uniqueness. Now it does not."
That feedback is much stronger than handing the AI a rule file and trusting it to internalize the philosophy. It is real time, specific, and tied to the actual code.
The documents matter. They give you a language and a framework. They help you think clearly. But they are not a substitute for developing your own ability to see the patterns. The moment you can recognize a namespace violation or a boundary leak without consulting a rule, the documents stop being your guardrails and become your tools.
That is why these five terms—namespace, boundary, contract, entry point, seam—are worth internalizing. Not because they are clever terminology, but because once you see them, you cannot unsee them. And once you cannot unsee them, you can protect your architecture in real time instead of cleaning up drift afterward.
How to Run a Plan
From competitive planning through disciplined execution — a repeatable process that works
Phase 1: Competitive Planning with Two AIs
The insight behind competitive planning is simple: two independent perspectives on the same problem produce a better plan than one. When you describe the same problem to two different AIs, their proposals will diverge on assumptions, trade-offs, and architecture choices — and those divergences tell you exactly where the hard decisions are.
Write one problem statement — a clear, scoped description of what you need to build or change. Feed it to both AIs in separate sessions with no shared context. You are looking for two independent proposals, not a collaboration. Keep it concrete: what is the goal, what are the constraints, what must not change.
Ask each AI to write a structured implementation plan: phases, files to touch, functions to create or modify, test strategy, risks. The output should be specific enough that another AI could execute it without guessing. Save the results as PLAN_A.md and PLAN_B.md.
Cross-pollinate. Show AI A the plan that AI B produced, and vice versa. Ask each to identify wrong assumptions, missed risks, and where their own approach is stronger — then produce a revised plan incorporating what was right in both. Save as PLAN_A2.md and PLAN_B2.md.
Pick one AI to do the merge — the one whose second-round plan was stronger. Hand it both PLAN_A2.md and PLAN_B2.md and ask it to produce PLAN_FINAL.md: a single, definitive plan that incorporates the best elements of both and resolves any remaining conflicts.
Phase 2: Executing the Plan with Engineering Rigor
Writing a good plan is half the work. The other half is handing it to an AI with an execution protocol that prevents the failure modes that kill AI implementations. Without a protocol, even a great plan gets ruined by an AI that codes from the document instead of the codebase.
The 6 Failure Modes of Unstructured Plan Execution
The RUN_PLAN.md template in Chapter III is an execution-control prompt that forces the AI to address each failure mode before writing a single line of code.
How to Use It
RUN_PLAN.md into your repository and adapt the project-specific sections (file paths, build commands, architecture boundaries).
RUN_PLAN.md.
Read RUN_PLAN.md, then execute PLAN_FINAL.md.
Sample Execution Prompts
Basic plan execution:
Execute my-feature.md using RUN_PLAN.md as your constraint.
Complete the preflight first, then proceed with implementation.
Plan execution with architectural review:
Execute refactor-auth.md using RUN_PLAN.md.
After implementation, do a tough review using PR_TOUGH.md
and output findings to PR_REFACTOR_FINDINGS.md.
High-risk execution with mandatory preflight review:
Execute the plan using RUN_PLAN.md.
Write the full Phase 0 preflight to PREFLIGHT.md
and wait for my approval before touching any code.
Do not proceed without explicit go/no-go from me.
The RUN_PLAN.md Template
Below is a generic RUN_PLAN.md you can copy into your project and adapt. The project-specific sections (file paths, build commands, architecture boundaries) are clearly marked for replacement. Everything else applies universally.
Using RUN_PLAN.md for Unplanned Work
RUN_PLAN.md is not only for formal implementation plans. It works equally well for bug fixes, one-off tasks, and any work where a plan was never written. The key insight: the value is in the preflight and validation discipline, not in the plan file itself.
Execute the plan in PLAN_FINAL.md
Task: fix [description].
No plan file. Use RUN_PLAN.md methodology.
Skip only the "confirm plan matches code" step from Phase 0 — that step requires a plan document to compare against. Everything else is identical: find the owner and seam, map the blast radius, identify the test surface, build a risk map, implement one bounded change at a time, validate, self-review.
A formal plan was written with the codebase in mind and has been reviewed. An unplanned bug fix is improvised — the scope is unclear, the blast radius is unknown, and there is pressure to move fast. That is exactly when the preflight step earns its keep.
/fix in .claude/commands/fix.md with the contents: "We are fixing a bug. Before touching any file: restate the bug, find the owner and seam, list every caller of any function you will change, identify the test surface, and build a risk map. Then implement the smallest correct change. Validate with targeted tests and a build. Self-review for silent failures, data shape drift, and test gaps. Do not commit without explicit instruction." One command loads the full discipline for any unplanned fix — no template to remember, no steps to skip.Ralph Loops: Autonomous Unattended Execution
Everything covered so far assumes a human is in the loop — reviewing the preflight, approving each phase, reading the completion report. A Ralph Loop removes that assumption. It is a technique for wrapping an AI coding assistant in a bash loop that feeds its own output back into itself, running unattended until a goal is complete.
The name comes from Geoffrey Huntley, who pioneered the pattern with Claude Code. The loop runs something like this: feed the AI a PRD (product requirements document) or a plan file, let it work, capture the output, feed it back in as the next prompt, repeat until the task list is exhausted or the AI reports completion. No human sitting there approving each step.
while ! task_complete; do output=$(claude --print "$(cat PROMPT.md)" 2>&1) echo "$output" >> session.log update_prompt "$output" done
The AI's output on each iteration becomes input for the next. The loop drives itself forward until done.
Where Ralph Loops Fit in This Page's Framework
A Ralph Loop is the execution layer taken to its logical extreme. RUN_PLAN.md is the discipline that makes a Ralph Loop safe rather than chaotic. Without it, the loop will happily execute the wrong seam, miss callers, skip tests, and declare completion on each iteration — with no human noticing until the damage accumulates. With it, each iteration begins with a preflight, implements one bounded change, validates, and reports before the loop feeds the next iteration.
PLAN_FINAL.md. It defines the goal. RUN_PLAN.md defines how each iteration of the loop executes toward that goal. Together they turn an unattended loop into something with engineering discipline baked in.When to Use a Ralph Loop
Ralph Loops are most effective when:
- The task is well-defined in a PRD and the acceptance criteria are unambiguous
- The codebase has strong test coverage — tests are the only feedback mechanism when no human is watching
- The work decomposes into independent steps that each produce a verifiable, committable result
- You have a build and test command that exits non-zero on failure, giving the loop a clean stop signal
The Human's Role in a Ralph Loop
Unattended does not mean unsupervised. The human sets up the loop, writes the PRD with precision, defines the stop condition, and reviews the session log after each run. The loop is an accelerator, not an abdication. Think of it as overnight compilation: you set it going, sleep, and review what it built in the morning — but you designed what it was supposed to build, and you decide whether it actually did.
Embarrassing Case Study
how not to do it, so you can learn how to do it
What Happened
One Saturday morning, coding away, I had too many AI console windows open and couldn't tell which was which. So I thought to put watermarks on the background — when I switched between consoles, I'd see the context. The rest of that day was that program. By the end of it I had v1 and v2 fully coded and was already using it. I shared it around at work, made a GitHub for it, and built an installer. This is the story of that program.
When you see v1 through v8 in the graph below, keep in mind: each check-in was a coding session and a version number, and I could do multiple versions in a day. This is a real story.
The tool was built fast. v1 through v8 added platforms, features, and integrations at high velocity: Claude Code, then Codex, then Copilot, then OpenCode, then Gemini, then Kiro. A Jira integration. GitHub PR review. Cost tracking. Windows Terminal profile management. A companion maintenance utility. A build pipeline. IP-protection obfuscation. All of it in roughly 10 weekends of active development.
By v8.3.0 the product worked. It had users. It had real commercial value. It also had the accumulated debt of every shortcut taken in the name of shipping the next feature.
The v9 arc was the reckoning. Not a rewrite — a systematic repair of every pattern that had been done wrong or duplicated across files. It ran from v9.0.0 through v9.9.8, across more than a dozen focused increments, and it cost more lines than all prior development combined.
The Graph That Started This Conversation
All 98 commits aggregated by major version. Each block represents a relative change wave. v8.x excludes the obfuscated build artifact.
| Version | Commits | Relative Change Size |
|---|---|---|
| v1.x | 14 | ████ |
| v2 | 13 | █ |
| v3.x | 7 | █ |
| v4.x | 2 | █ |
| v5.x | 7 | █████████ |
| v6.x | 2 | █ |
| v7.x | 1 | █ |
| v8.x | 7 | ████████ |
| v9.x | 45 | ████████████████████████████████████████ |
v9 is not close. Scores of change sets across 45 commits, dwarfing all prior eras combined. That bar is the architecture investment made visible.
The Architecture Tax
Project: A multi-platform AI session manager for Windows Terminal
Language: PowerShell 5.1
Timeline: 10 weekends
Commits: 98 total
The question this case study answers: What does it cost when you skip architecture on day 1?
The Specific Patterns That Were Retrofitted
Each v9 task addressed a class of problem that was avoidable on day 1:
Variant logic scattered across files. Every supported platform had its behavior hardcoded in whichever file happened to need it first. Adding a new platform required touching seven files. v9 introduced a central registry: one structure, all variant behavior, zero hardcoding elsewhere.
Raw I/O everywhere. Owned data files were read and written directly with no atomicity, no backup, no error handling. A crash mid-write meant silent data corruption. v9 introduced canonical read/write helpers that all managed files route through — the only place in the codebase allowed to touch owned data files directly.
External system mutations without transactions. Operations that modified an external system's config read it, mutated it in memory, and wrote it back directly. A failure mid-operation left the external system in a broken state invisible to the user. v9 wrapped all such mutations in a transaction pattern: read, modify in memory, write atomically, validate, restore on failure.
Entity identity with no documented precedence. A key entity's display name could be derived from multiple sources, each with its own priority. Each code path had its own inference logic. When they disagreed, the entity showed the wrong name. v9 established a single canonical function with a documented precedence order called by all display, repair, and rename code. The bug had been latent since v1.
Test infrastructure that tested the wrong code. The test suite was stitched into the same artifact as the production code. Several tests matched by string search in a way that found their own test definitions before the actual production functions. Tests were silently passing while the code they were supposed to protect had real violations. v9 fixed the matching logic, revealing six tests that had been producing false results for months.
Orphaned artifacts from failed operations. When a multi-step operation failed partway through — after some artifacts were created but before the operation completed — the partial artifacts remained permanently. Users would find ghost entries they never created. v9 introduced artifact cleanup tracking: record each creation, clean up everything created so far in the catch block before rethrowing.
Duplicate ceremony in every entry path. Multiple code paths each had their own copy of the same 15–20 step sequence. When one copy got a bug fix, the others did not. v9 extracted all shared steps into helper functions called by every entry path.
"A Rewrite Would Have Been Easier"
Looking at the graph, that thought is natural. Here is the direct answer.
What the v9 change sets actually contain: A significant portion are tests. A rewrite needs tests too — written without the knowledge of which edge cases fail in production. Many are planning documents. The largest individual commits are largely refactoring churn: the same lines deleted from one location and added to another. A function that moves between files costs double in the diff. The real net change is zero.
What a rewrite would actually cost: Five Windows Terminal integration surfaces, each with non-obvious quirks. Five platform data formats reverse-engineered from live data. All discovered knowledge. A rewrite starts from zero. Most importantly: a rewrite is done under the same pressure that created the original architecture problems. The impulse to ship features beats the impulse to do it right. That is exactly what happened in v1 through v8.
Where the observation is correct: The v9 arc was more expensive than it needed to be because it happened after the product shipped instead of before. The canonical helpers, registry-driven dispatch, atomic writes, and red-green testing that v9 established would have cost a fraction of what they cost at v9 if they had been the founding architecture. The graph is the receipt. The question it should prompt is not "should we have rewritten?" but "what would we have said on day 1?"
The Actual Cost
At standard software engineering rates ($150–200/hr senior developer), the v9 arc represents:
- 45 commits over a concentrated sprint of focused work
- Estimated real developer hours: 60–100 hours
- At $175/hr: $10,500 — $17,500 to retrofit architecture that cost nothing to establish
The same 12 rules, put in place on day 1, would have taken perhaps 2 hours to write and would have been enforced automatically by the AI assistant with every subsequent code generation. The entire v9 arc — every refactor, every bug it revealed, every test it required — would not have been necessary.
That is the cost of skipping architecture on day 1.
What the Product Looked Like After
At v9.9.8, the codebase passed a zero-violation code review against 29 mandatory architectural rules. The test suite had 640+ tests across 30 files, with named validation bundles for CI/CD integration. Adding a new AI platform required zero changes to rendering, dispatch, WT profile management, or image generation — one registry entry and the platform-specific discovery function. Session lifecycle errors left no orphaned artifacts. Windows Terminal mutations rolled back automatically on failure.
The v9 arc was not a mistake. It built a product that can grow without collapsing. The mistake was not having the architecture from the start.
The Day-1 Prompt
This is a template CLAUDE.md — a complete architectural ruleset you would paste at the start of your project, before the first line of code. Every rule comes from something that actually broke in production. Fill in your project details below and download a ready-to-use file.
Generate Your Day-1 File
Fill in what applies. Leave anything blank to keep the <placeholder> — you can fill it manually later.
↓ Full template for reference — fill the <placeholders> manually if you prefer:
You are building <YourProduct> -- <one sentence description of what it does>.
Before writing any code, the following architectural rules are non-negotiable. Every
decision you make must be consistent with them. When in doubt, ask before deviating.
---
RULE 1: <YOUR VARIANT DIMENSION> REGISTRY IS THE ONLY SOURCE OF TRUTH FOR VARIANT BEHAVIOR
All <variant>-specific values live in a single central registry structure in one file.
No other file may hardcode a <variant> name, identifier, or behavior as a literal.
When you need <variant>-specific behavior, read it from the registry.
When adding a new <variant>, add one registry entry. Zero other files change.
This rule exists because <variants> will be added continuously. If variant logic is
scattered, every new <variant> is a surgery. If it is in the registry, every new
<variant> is a data entry.
---
RULE 2: ALL READS AND WRITES TO OWNED DATA GO THROUGH CANONICAL HELPERS
<YourProduct> owns several data files. Every read goes through Read-DataSafe.
Every write goes through Write-DataAtomic, which writes to a temp file and renames
atomically so a crash mid-write cannot corrupt data.
Never read or write owned data files directly anywhere else in the codebase.
External data owned by other systems may use raw reads, but any writes to external
config must still use an atomic write helper. Document the reason at each raw-read site.
---
RULE 3: ALL MUTATIONS TO <EXTERNAL SYSTEM> GO THROUGH A SERVICE LAYER
Any operation that modifies <external system state> must call a function in the
<system> service layer. No file outside that layer may read or write <external system
state> directly.
The service layer must wrap mutations in a transaction: read current state, make the
change in memory, write atomically, validate the result. On any exception, restore the
backup. This is not optional even for "simple" changes. Silent corruption is invisible
until users hit it.
---
RULE 4: <KEY ENTITY> IDENTITY HAS ONE CANONICAL PRECEDENCE ORDER, ENFORCED IN ONE PLACE
<Entity> display name resolution:
<source A> > <source B> > <fallback>
These rules are implemented in exactly one function each. All display code, all repair
code, and all rename code calls those functions. No file infers <entity> identity
through its own logic. When this rule is violated, <entity> shows the wrong name or
links to the wrong record.
---
RULE 5: SOURCE CODE IS ORGANIZED INTO NAMESPACES. EACH NAMESPACE OWNS ITS CONCERNS.
src/core/ -- global state, persistence helpers, <variant> registry
src/<domain A>/ -- <domain A> lifecycle operations
src/<domain B>/ -- <domain B> operations
src/ui/ -- rendering, display, navigation
src/integration/ -- external service clients
src/testing/ -- all test code (never included in production builds)
A file in one namespace must not implement another namespace's logic. Cross-namespace
dependencies create phantom coupling: changes in one area break unrelated areas silently.
If a function's home is not obvious, it belongs in core/ as a shared helper.
---
RULE 6: THE REPO ROOT STAYS CLEAN
The repo root contains only:
- Primary entrypoints and launchers
- Primary compiled or stitched artifacts
- Canonical instruction and index documents (CLAUDE.md, AGENTS.md, README.md, <docs-index>)
- Major source and support directories
Build machinery belongs in build\ or equivalent.
Launcher wrappers belong in Install\ or equivalent.
Planning docs, one-off notes, and screenshots do not belong at the root.
Temporary files placed at the root during development must be cleaned up before commit.
---
RULE 7: DOCUMENTATION HAS DEDICATED NAMESPACES
docs/ is organized into stable subfolders. Every durable document belongs in exactly one:
docs/<product-docs>/ -- user-facing docs and release history
docs/<planning-docs>/ -- backlog, plans, and future ideas
docs/<reference-docs>/ -- architecture, data models, and reference material
docs/<process-docs>/ -- AI workflow, review standards, and comment standards
docs/<assets>/ -- images and media used by docs
Do not place new documents loose in docs/ without a namespace.
Any durable new document must be placed in a namespace and linked from <docs-index>.
---
RULE 8: NEW TOP-LEVEL DIRECTORIES REQUIRE JUSTIFICATION
Create a new top-level directory only when all three are true:
- The content does not fit any existing namespace
- It represents a real subsystem or product boundary
- Top-level placement improves clarity more than nesting would
Default to using an existing namespace. The burden of proof is on the new directory.
---
RULE 9: THE BUILD SYSTEM IS MULTI-PRODUCT FROM DAY ONE
If you ship more than one artifact from this codebase, each has a manifest listing
the source files to include in order.
When you add a source file: update all manifests or document why it belongs to only one.
When you move a function: check every manifest.
When you add a shared helper: include it in all manifests that need it.
Failing to maintain manifests silently breaks one product while the other works.
---
RULE 7: EVERY BUG FIX IS PRECEDED BY A FAILING TEST
Before fixing any bug: write a test that reproduces it and fails. Then fix the bug.
Then confirm the test passes. The test is not optional.
Bugs fixed without tests return. A test that explicitly reproduces a bug is
documentation that the bug was real, proof that the fix is correct, and insurance
that the fix stays correct.
Use string-literal test identifiers, not sequential numbers. Sequential numbers
require renumbering when tests are inserted. String literals survive reordering.
---
RULE 8: EVERY FUNCTION THAT CREATES ARTIFACTS MUST CLEAN THEM UP ON FAILURE
If a function creates any persistent artifact -- a record, a config entry, an external
system object -- and anything goes wrong before the function completes, all created
artifacts must be removed in the catch block before rethrowing.
Track artifact creation with boolean flags:
artifactCreated = false
... create artifact ...
artifactCreated = true
... on exception ...
if (artifactCreated) { remove artifact }
Artifacts left behind by failed operations accumulate silently and confuse users.
---
RULE 9: EXTRACT SHARED HELPERS AT THE SECOND INSTANCE, NOT THE THIRD
The first time you write a pattern inline, that is acceptable. The second time you
write the same pattern in a different file, extract it into a shared helper first.
Do not wait for three instances.
When duplication reaches three or four instances, the fix requires touching every
instance. When caught at two, the fix is cheap.
---
RULE 10: NO EMPTY CATCH BLOCKS. NO SWALLOWED EXCEPTIONS.
Every catch block must either:
a) Log the exception to a debug/error log, or
b) Re-throw after performing cleanup, or
c) Return a structured error result that the caller checks
An empty catch is a bug. Swallowed exceptions cause functions to return success when
they have failed and users to see wrong state with no indication of what went wrong.
---
RULE 11: RUNTIME COMPATIBILITY IS A HARD CONSTRAINT, NOT AN AFTERTHOUGHT
This tool runs on the user's machine. Specify your minimum runtime and test against it.
Do not use language features or library calls that require a newer runtime than your
stated minimum. New syntax that silently degrades on older runtimes is the hardest
class of bug to diagnose.
---
RULE 12: THE <KEY OPERATION> LIFECYCLE HAS ONE CANONICAL SEQUENCE
Every path that triggers <key operation> follows the same sequence of shared steps:
1. Resolve <variant> and configuration from registry
2. Resolve or generate <entity> identity
3. Create or acquire required resources
4. Execute the operation
5. Confirm and record the result
6. On any exception: release/remove all resources acquired in steps 3-4
These steps are implemented as shared helper functions called by all entry paths.
No entry path owns its own version. When steps are implemented multiple times, each
copy drifts. When one copy gets a fix, the others do not.
---
RULE 13: COMMENTS DESCRIBE CURRENT TRUTH, NOT HISTORY OR INVENTORY
File headers must contain: File, Namespace, Purpose, and optional Notes.
File headers describe ownership and invariants -- not a list of every function.
Function inventories in file headers are forbidden. They go stale immediately and
mislead both humans and AI about what the file actually does.
When a refactor changes a file's responsibility, update the file header in the same
commit. Not later. Not in a cleanup pass. In the same commit.
Use function-level documentation blocks (.SYNOPSIS, docstrings, JSDoc, xmldoc --
whatever your language provides) for detailed behavior. File headers describe
boundaries. Function blocks describe contracts. Neither does the other's job.
Comments must describe current truth, not historical intent. If a comment describes
something that was true in v3 but is not true now, delete it. History belongs in
git log and CHANGELOG -- not in source files.
A stale comment is worse than no comment. It will mislead the next developer -- and
it will mislead your AI, which reads comments as instructions.
---
WHY THESE RULES EXIST
Every rule above addresses a specific class of production bug or refactoring cost
that compounds over time. None are style preferences. Each rule was written because
something broke in production without it.
These rules do not slow development. They slow the first implementation of each
pattern by a few minutes. They prevent the class of incident where a single missing
cleanup call costs two engineering weeks to diagnose six months later.
Hold to them from the first commit.
Tests Are the Exoskeleton
The single most important observation about AI-developed software.
Tests aren't quality hygiene in this context — they're the only thing that gives the AI memory of what the system is supposed to do.
A human developer carries architectural intent in their head. When they change module A, they remember module B depends on it. An AI has no such memory between sessions. It only knows what it can see — and in a large codebase, it can't see everything. Tests are the exoskeleton that holds the shape of the system while the AI works inside it.
- Every session starts from scratch conceptually
- Fixes break prior fixes invisibly
- The AI confidently builds forward on a crumbling foundation
- "Done" means "compiles and looks right" — not "works correctly"
A codebase at 11% test coverage has no exoskeleton. Each change may be correct in isolation and wrong in the system. There's no way to know.
A production codebase with 800+ tests caught 400+ bugs before they shipped — that isn't just a quality metric, it's proof that the bugs existed and would have reached users. That's not a hypothetical benefit. That's 400 production failures that didn't happen.
Tests That Know What the Architecture Is
The corollary worth noting: the best test architecture is itself registry-aware. Static analysis tests scan for hardcoding violations — meaning the tests don't just verify output, they enforce architectural contracts. When the AI tries to shortcut the registry pattern, a test fails. The discipline is self-reinforcing in a way that no amount of code review can replicate.
From Vibe Coding to Engineering
This website is a personal discovery journey. I was amazed how much I learned just in the Advanced: Building a Plugin section I added, and how it demystified the architecture of Claude even more. I also realized as I wrote and took the quizzes that I had been using AI incorrectly. This site was changing my view of AI.
I admit, I used to think that if I did Claude Code and made apps through vibe coding, that I was an AI engineer — the proof was in the amazing apps I wrote (with one in the Chrome store, and loads of tools written for work). Alas, having done that, and trying to fix an app that I made, I realized that I was unwittingly doing it wrong. I wasn't coding with Claude correctly, and yet I was getting great results. And by the time I got to version 5 of one of my projects, it started collapsing. I am on version 10 now, mainly refactors — leading to the creation of this Manifesto. I have spent more time refactoring that app (still broken) than I did making it. That is why I stepped back, and rethought how to avoid that trap. I had to learn how to do this right, and I am still learning.
To call myself an AI Engineer (as I did) was a false sense of power from early success with AI prompting. And I saw non-engineers having the same successes (even greater!), with no engineering discipline. However, as I made this manifesto, I realized that engineering is possible through AI tools — if the hooks, agents, skills, memory and plugins are used properly. If a person is vibe coding with good results, I want them to join me in rethinking the engineering aspects. I could have saved myself so much time if I followed the patterns learned from my career as a software engineer.
Claude Code is like a loaded gun: you can point it at an animal and have a meal, or you can shoot your own foot. Consider this site to be a gun safety course and weapons training. The quizzes are important, as they reinforce learning. I'll keep vibe coding (I didn't engineer this site, I vibe coded it), but there is a time to engineer.
In all honesty, engineering is not my goal. Immediate and powerful solutions have drawn me to AI (and Claude Code in particular). I can have a great app in 10 hours? Yes! But in that fever-pitched drive to a solution, I had a flaw. The flaw was in my thinking. I thought Claude was something it isn't. That is why I stepped back, studied Claude more closely, and wrote this site. I hope the manifesto does its work on you too. My problem was not bad prompting, but bad thinking, and I can't ignore engineering, because I needed to re-engineer my own thoughts. Ironically, our underlying subject is software that emulates human thinking, and as you correct your thoughts about AI, you are engineering the only biological brain you can change.
For more, I point you to this interactive tutorial on the early history of AI, with videos: github.com/srives/Perceptron
The New Epistemology
A philosophical retrospective on what AI programming is doing to the programmer.
I am building up a mental model of what programming is becoming. The black CLI screen, the text prompt, the running agent inside the repository: these are not incidental surfaces. They change the way the work feels. The terminal is spare, verbal, procedural, and immediate. It is less like arranging objects in an IDE and more like addressing a machine intelligence in its own workshop.
That changes the programmer. The tool affects the thing it is designed to affect, but it also affects the practitioner. A hoe cuts the soil and raises blisters on the hand. LLM/CLI programming cuts through boilerplate and raises blisters in the mind.
At first there is pain. Then the calluses form. The callus is not numbness. It is trained sensitivity.
The Medium Works Back
McLuhan's line was that the medium is the message. The related lesson is that we shape our tools, and then our tools shape us. AI-assisted programming makes that literal. I prompt the model, but the model's habits teach me what I must become more precise about.
When I build now, I often do not see code first. I see blocks of intention: stores, boundaries, contracts, lifecycle flows, entry points, invariants, and seams. Handwritten code can feel small because it is local. The mental object is larger than the line. The line is only one visible trace of the object.
From Files to Stores
The word "file" names a physical artifact. The word "store" names a responsibility. A store is durable state with rules: where it lives, who may read it, who may mutate it, how writes are made atomic, how failure rolls back, and what shape must remain true afterward.
That shift matters because AI is often locally helpful and globally careless. It will write the direct file access if the direct file access solves the immediate problem. It will place a helper where it is convenient. It will preserve an old path "for safety." It will let two sources of truth coexist if nobody forces the question of ownership.
The new discipline is to ask, before the code appears: who owns this state? What boundary is being crossed? What is the one canonical path? What invariant must survive the edit?
Seams and Boundaries
A seam is the narrow handoff where one responsibility gives way to another. It is the place where behavior can be swapped, tested, or refactored without corrupting the rest of the system. A good seam preserves future freedom. A muddy seam destroys it.
This is why boundary bugs feel different from algorithm bugs. An algorithm bug is local. A boundary bug spreads. When canonical truth exists in two places, fixing one copy does not fix the system. When three entry points perform the same lifecycle differently, testing one path proves almost nothing about the others.
The Mental Blisters
| Blister | What Hurts First | The Callus That Forms |
|---|---|---|
| Loss of tactile contact | The AI wrote code I did not personally touch. | Inspection through diffs, tests, logs, contracts, and invariants. |
| Boundary pain | The feature works, but the responsibility landed in the wrong place. | The reflex to ask who owns the behavior and what may cross the boundary. |
| Contract hunger | Ambiguous returns, partial objects, silent failure, and caller guesswork. | Explicit shapes, failure modes, cleanup duties, and canonical precedence. |
| Professional suspicion | The thing appears complete before it has proven itself. | Hostile review, preflight, targeted tests, and refusal to trust plausible output. |
| Memory externalization | The same hard-won lesson is lost between sessions. | CLAUDE.md, AGENTS.md, memory files, plans, skills, hooks, and tests. |
| Token-cost consciousness | Context fills, attention thins, and old detail compacts away. | Compressed rules, durable documents, named patterns, and load-on-demand procedures. |
The New Unit of Thought
| Before | After |
|---|---|
| Line / function / file | Owner / boundary / contract / seam / invariant |
| Compiler error / bug | Drift / duplicated truth / muddy ownership / invisible second path |
| Developer remembers intent | Tests, rules files, architecture docs, prompts, hooks, and review protocols preserve intent |
| Write correct code | Shape a system so an AI can safely modify it |
The Blind Potter
I feel like a blind potter. The clay is no longer directly under my hand. It is behind a curtain, and I feel it through delayed signals: diffs, test failures, logs, runtime behavior, and architectural drift. Yet the cup can still be made. The bowl can still be shaped.
The millions and billions of tokens are the pounds of clay wasted while learning where the form is. They are not just cost. They are apprenticeship. The waste teaches touch.
The Token Apprenticeship
Tokens are not the real unit of learning, but they are a useful metaphor. The real unit is repeated, painful failure classes.
- Awe: The output appears. The speed feels like proof.
- Pain: The product works, but repair becomes confusing. "Working" and "engineered" separate.
- Vocabulary: Failures get names: boundary violation, stale path, duplicated source of truth, missing test, wrong owner.
- Callus: Durable controls appear:
AGENTS.md,CLAUDE.md,RUN_PLAN.md, tests, hooks, review prompts, architecture rules. - Second order: The programmer stops merely prompting the AI and starts designing the environment that shapes the AI's behavior.
One brutal project can teach more than millions of pleasant tokens. The fast build gives power. The refactor gives epistemology.
A Practical Discipline
The next leap is not only to model what the AI builds. It is to model how the AI tends to fail.
- Name the failure class.
- Decide whether it is local or architectural.
- Add a rule only if the failure class will recur.
- Add a test if the rule can be enforced mechanically.
- Add a seam if future change needs protected freedom.
- Add memory only for facts the next session must inherit.
Code Commenting Practices to Reduce Drift
Comments lie. Not intentionally — they were true when written. But refactors move functions, responsibilities shift, and the comments stay behind describing a structure that no longer exists. The result is code drift: the gap between what comments say the code does and what the code actually does.
AI makes this worse before it makes it better. AI reads comments as instructions. If a file header says "Contains: session launch, fork handling, profile creation" but the refactor moved two of those elsewhere, AI will confidently generate code in the wrong place — because the comment said so.
Why Drift Happens
The drift in a real production project happened for predictable reasons:
The Fix: Contract-Oriented File Headers
Replace function inventories with stable contract-style headers. The right format for any file is:
This format is searchable, durable, and meaningful to both humans and AI. It survives refactors because it describes ownership, not contents. Contents change. Ownership changes far less often.
Example: Before and After
Before (inventory style — drifts immediately):
After (contract style — stays true):
The second version will still be accurate after you move three functions out of the file. The first won't.
Function-Level Comments
Detailed behavior belongs on functions, not file headers. Use your language's native documentation block:
This way: the file header describes the boundary. The function comment describes the contract. Neither tries to do the other's job.
Rules to Add to CLAUDE.md
Add these to your CLAUDE.md to prevent drift from the start. AI will enforce them with every subsequent code generation:
The Short Version