Building AI Agents
with Claude Code
From "what is an agent?" to shipping autonomous agents on your own machine. Six modules of subagents, tools, MCP, plan-then-execute, and the production guardrails that make agents safe to trust.
Get StartedWhat Is an Agent?
An agent pursues a goal across many steps — choosing tools, observing results, and deciding what to do next. Where a prompt answers once and a skill applies expert guidance to one task, an agent runs a loop until it's done.
Prompt (one shot)
One question, one answer. You decide every next step. Cheapest, fastest, easiest to debug.
Skill (expert behavior)
Reusable expert guidance for one task — Claude loads it when the description matches. You still drive the conversation.
Agent (autonomous)
Pursues a goal across many steps. The agent itself decides which tool to call next, within boundaries you set.
Reach for an agent only when the number of steps depends on what's discovered along the way, the work runs unattended, or multiple tools must be combined dynamically. Otherwise a skill is cheaper, faster, and easier to debug.
Goal → Plan → Act → Observe
Every agent — no matter how complex — runs the same four-step loop. Learn this diagram. If your agent isn't doing all four, it's a skill in disguise.
Goal
What success looks like
Plan
Which tool, which step, in what order
Act
Run a tool, get a result
Observe
Done? If not, loop back to Plan
Three ingredients power every agent: tools (what it can do in the world), memory (what it remembers across the loop), and autonomy (how many steps it can run before checking in with you).
Anatomy of a Subagent File
Subagents in Claude Code live as Markdown files in .claude/agents/. YAML frontmatter declares the agent's identity and tool allow-list; the body is the system prompt that shapes how it behaves inside the loop.
---
name: file-organizer
description: Scans a folder, groups
files by type, and moves them into
typed subfolders. Dry-run by default.
tools: Read, Glob, Bash
---
You are a file-organizer agent.
GOAL: organize the folder the user
points you at into typed subfolders
(images, docs, archives, code, video,
unsorted).
PLAN before acting. State your plan
in one paragraph.
ACT one step at a time. After each
tool call, ask: am I done?
Default to dry-run. Only move files
when the user passes execute=true.
STOP if you exceed 30 tool calls or
8 minutes of work.
NEVER:
- Move files outside the target folder
- Delete anything
- Continue past a destructive action
without confirmation
name
Short identifier. Used when invoking the agent and shown in your agent list.
description
One sentence that says when to invoke this agent. Specific verbs, clear scope — the same trigger discipline as a skill description.
tools (allow-list)
The agent can only use tools listed here. Anything not listed is unreachable. Smaller is safer — start narrow, widen when needed.
System prompt
The agent's persona, goal, plan-then-execute rhythm, stopping conditions, and explicit "never" rules. This is where you encode safety.
Set Up Your Workshop Project
Subagents live in .claude/agents/ inside whatever project you open. Create a workshop project once, and every agent you build today will land in the same predictable place.
Claude Code installed and signed in · A terminal you're comfortable with · 5–10 minutes per module · Optional: a GitHub PAT and Gmail account for the MCP-powered modules.
Create the workshop folder
Make a folder for today's work and open it in Claude Code. All subagents you build will save into .claude/agents/ here.
mkdir -p ~/agents-workshop && cd ~/agents-workshop
claude
Verify the agents directory
Inside the Claude Code session, ask:
"Create the directory .claude/agents/ here if it doesn't exist, and list what's inside."
Empty is fine — you'll fill it in Module 2.
List available agents
Anytime you want to see which subagents are loaded:
/agents
After Module 2 you'll see file-organizer here. By the end of the workshop, you'll have several.
Module 3 uses MCP. To get a head start: claude mcp add github -- npx -y @modelcontextprotocol/server-github (you'll be prompted for a personal access token).
Six Modules, Concept to Capstone
Each module builds on the last. Concept → first agent → tools → multi-step planning → production safety → ship something that's yours.
Install Claude Code
Step-by-step installation for Windows & macOS — Node.js, npm, Claude Code, and API key setup.
↗ Setup GuideWhat Is an Agent?
Prompt vs skill vs agent. The four-step loop. The three ingredients (tools, memory, autonomy). When NOT to build an agent.
⏱ ~30 minFile Organizer Subagent
Build your first .claude/agents/ file. Run dry-first, then live. Debug a misfire on purpose. Read every line of the YAML and prompt.
Beyond the Filesystem
Built-in tools (Bash, Web*), MCP servers (GitHub, Slack, Calendar), and custom tools via the Agent SDK. Build a PR triage agent.
⏱ ~45 minResearch Brief Agent
Plan-then-execute pattern. Self-review for citation accuracy. Decide when to ask vs proceed. Force the agent to fix its own output.
⏱ ~60 minBudgets, Hooks & Observability
Cap steps, time, and tool calls. Hooks as guardrails. Idempotency. Read the trace. Decide when humans must stay in the loop.
⏱ ~45 minPick & Ship
Choose one of five templates — Inbox Triage, Daily Digest, PR Reviewer, Calendar Coordinator, Research Companion — customize, and demo it in 90 seconds.
⏱ ~60 minTake It Home
Subagent template, tool selection chart, common failure modes & fixes, plus 10 agent ideas to build at home.
⏱ ReferenceFile Organizer — Your First Subagent
A real subagent that organizes your ~/Downloads folder. Build it conversationally, run it dry-first, then live. Debug a misfire on purpose so you never trust an agent blindly.
Scaffold the subagent
From inside ~/agents-workshop, paste this into Claude Code:
"Create a subagent at .claude/agents/file-organizer.md. Its job is to scan a folder I point it to, group files by type (images, docs, archives, code, video, other), and move them into typed subfolders. Allow only Read, Glob, Bash. Default to dry-run mode unless I pass execute=true."
Read every line of the file Claude wrote
Open .claude/agents/file-organizer.md. You should be able to follow every line — YAML frontmatter (name, description, tools allow-list), then the system prompt body with goal, plan-then-execute rhythm, dry-run default, and stopping conditions.
1. The tools: line — that's the entire allow-list.
2. The dry-run default — usually phrased as "only execute when execute=true".
3. A stopping condition — max steps, max tool calls, or a time cap.
Run it dry-first
Always plan-before-act. The agent should print a move plan and stop, waiting for approval.
"Run the file-organizer subagent against ~/Downloads in dry-run mode. Show me the plan but don't move anything yet."
This is the moment that separates "I trust this agent" from "I just lost my files." Always read the plan. Always.
Run it for real
Watch the trace. Claude moves files one by one, reporting each move. When it's done, you have a clean Downloads folder.
"Looks good. Run it with execute=true."
Debug a misfire (on purpose)
Drop a file with no extension into ~/Downloads and re-run the agent. It will misfire — the trace will show why. Find the gap in the system prompt, then fix the agent (not the output):
"The agent doesn't handle files without extensions. Update file-organizer.md so unknown files go into an unsorted/ subfolder. Re-run dry-run to verify."
Module 2 is complete when you can explain what each line of the file does, you've run the agent dry then live, and you've debugged at least one misfire by editing the agent itself.
Tools Beyond the Filesystem
An agent is only as capable as its tools. Built-in tools cover the basics; MCP servers connect to GitHub, Slack, Calendar, Linear, Notion, Gmail; and custom tools (via the Agent SDK) reach anywhere your code can.
Built-in tools you already have
Every Claude Code agent has these out of the box. Pick the smallest set that does the job.
Read / Write / Edit — files inside the project. Hook-enforced scope.
Glob / Grep — find files and content fast. Read-only.
Bash — anything the shell can do. Highest blast radius — hook it.
WebFetch — read a known URL. Mind rate limits.
WebSearch — find URLs you don't know yet. Always verify with WebFetch.
MCP servers — connect to real systems
MCP (Model Context Protocol) lets your agent talk to GitHub, Slack, Calendar, Linear, Notion, Gmail, and dozens more — each one adds a fresh set of tools to the allow-list.
claude mcp add github -- npx -y @modelcontextprotocol/server-github
# you'll be prompted for a personal access token
# restart Claude Code; new tools (github_list_prs, github_get_pr, …) appear automatically
Build a GitHub PR triage agent
Read-only by design. The agent lists PRs and summarizes — never comments, never merges.
"Create .claude/agents/pr-triage.md. It uses the GitHub MCP tools to list open PRs in a repo I name, fetches the title, author, and last update for each, then writes a summary table sorted by staleness. Read-only — no comments, no merges."
Test it on a real repo you maintain. Notice how the tool allow-list keeps the agent honest — even if it tried to merge, the tool isn't there.
Custom tools via the Agent SDK
When no MCP server exists for what you need, write a tool. Five lines is enough to reach a private API.
from claude_agent_sdk import tool, agent
@tool("check_inventory")
def check_inventory(sku: str) -> dict:
"""Returns stock for a given SKU from internal API."""
return requests.get(f"https://api.acme.internal/sku/{sku}").json()
agent(tools=[check_inventory], goal="Reorder anything below 10 units")
Tool selection — three questions to ask
I/O cost — how slow or expensive is one call?
Blast radius — what's the worst thing it can do if misfired?
Confirmation — should the agent stop and ask before using it?
If a tool fails any of those badly, either don't grant it, or wrap it in a hook (Module 5) so misuse is impossible.
Research Brief — A Multi-step Agent
The simplest pattern that handles real complexity: first plan, then execute, then self-review. You'll build an agent that decomposes its own work, cites every claim, and refuses to fabricate.
Plan — the agent writes the steps it will take, before any tool calls. Execute — runs the plan one step at a time. Self-review — checks the output against the goal, revises if needed.
Scaffold the agent
"Create .claude/agents/research-brief.md. Given a topic, produce a 1-page brief with: 5 sources (URLs, titles, dates), claims with citations [1]...[5], and a one-paragraph synthesis. Tools: WebSearch, WebFetch, Write. Plan-then-execute pattern. Self-review for citation accuracy before returning."
Run on a real topic
"Use research-brief on the topic 'agentic coding 2025'. Save the output to briefs/agentic-coding-2025.md."
Watch carefully. The agent should:
- Print its plan first — search queries, what it expects to find
- Execute each search
- Read the top hits with WebFetch
- Self-review: "did I cite every claim?"
- Revise if needed, then write the file
Force better citations — fix the agent, not the output
If your brief has uncited claims, the bug is in the system prompt. Tighten the rule once and every future brief inherits the fix.
"Update research-brief.md: any claim without a citation marker [N] must be removed during self-review. No exceptions."
Decision rules — when to ask vs proceed
Bake these into the agent's system prompt so it behaves predictably without you in the chat.
Ask once at the start, then proceed. Don't keep re-asking mid-loop.
Present both with citations. Don't pick a winner.
If the agent passes its tool-call cap (e.g. more than 30 searches), stop and report.
Return a "couldn't find" brief. Never fabricate to fill the page.
Run on a second topic — verify behavior is consistent
A good agent runs the same playbook every time. Pick a different topic and re-run. Plan should appear, every claim should carry a marker, the file should land in briefs/.
You have at least two saved briefs · every claim has a citation marker · you can explain plan-then-execute without looking at notes.
Budgets, Hooks & Observability
Every agent so far has been one you watched. Now make them safe to run while you're not watching — caps on cost, hooks that enforce rules the agent could otherwise bend, idempotent steps, and a trace you can read after the fact.
Budgets — caps on cost and runaway loops
Add to any agent's system prompt: "Stop and report if you exceed 30 tool calls or 8 minutes of work." A few sensible defaults:
Loop iterations before forced stop. Default 20–50.
External calls before forced stop. Default 30.
Wall-clock. Default 5–10 minutes for interactive agents.
Context spend. Set at the project level when you can.
Hooks as guardrails
Hooks run shell commands before/after tool calls. Use them to enforce rules the agent could otherwise bend — the harness runs hooks, not the agent, so they cannot be talked around.
"Add a PreToolUse hook to my project's .claude/settings.json that blocks any Write or Edit outside ~/agents-workshop/. Hook fails the call if the path doesn't match."
{
"hooks": {
"PreToolUse": [
{
"matcher": "Write|Edit",
"hooks": [{
"type": "command",
"command": "check-path-allowed.sh"
}]
}
]
}
}
Idempotency & retries
Make every tool call safe to repeat — agents die mid-run, and the next run should pick up where it stopped, not start over.
— Check before write — does the file already exist? Skip or update.
— Use unique IDs — task-id-{date} so repeated runs don't double-create.
— Mark progress — write a .done file after each completed step.
— Resume, don't restart — let the next run skip what's already finished.
Observability — read the trace
Every agent run leaves a JSONL trace on disk. When something misfires, share that file — it contains every prompt, every tool call, every output. It is your debugger.
# list runs for this project
ls ~/.claude/projects/<project>/
# pretty-print the latest run
cat ~/.claude/projects/<project>/<latest>.jsonl | jq
Human-in-the-loop — when to require approval
Some actions cost too much to be wrong. Implement these as hooks — never trust the agent's "I'll be careful" promise alone.
— Sending external messages (email, Slack, customer-facing)
— Writing to shared infrastructure (production DB, CI config, prod branches)
— Spending money (paid APIs, charges, deploys)
— Anything destructive (delete, force-push, drop)
When to use a hook vs the system prompt
Use a hook
- The rule must be unbreakable
- Path scope, secret-scrubbing, "no production"
- Anything you'd otherwise have to trust the model on
Use the system prompt
- Stylistic preferences and output format
- Plan-then-execute rhythm
- Self-review criteria
Pick a Template & Ship
Pick one template. Customize it with your real data. Run it end-to-end. Demo it in 90 seconds. Five templates to choose from — each one is a one-prompt scaffold you'll refine to fit your life.
Inbox Triage
Reads the last 20 unread Gmail messages, classifies each (reply / archive / forward / wait), drafts replies in your tone — never sends. You ship manually.
Daily Digest
At 7am, fetches the latest from 3 sources (RSS, HN, a Slack channel), writes a 200-word brief, saves to ~/digest/YYYY-MM-DD.md. Skips quiet days.
PR Reviewer
Fetches a PR diff, runs your team's checklist (edge cases, tests, naming, perf), produces a comment-ready review. Posts only when you confirm post=true.
Calendar Coordinator
Reads your next 7 days, finds open 90-minute blocks, drafts deep-work sessions matching priorities you list. Outputs a plan — never modifies the calendar.
Research Companion
A scaled-up Module 4. Maintains a topics list, refreshes the oldest brief daily, saves history so you see what changed week over week.
Each template is one prompt away. Pick yours and paste:
"Create .claude/agents/inbox-triage.md. Uses the Gmail MCP tools to read the last 20 unread messages, classify each (reply / archive / forward / wait), and for reply candidates draft a response in my tone. Outputs a markdown table — never sends. I send manually."
"Create .claude/agents/daily-digest.md. At 7am, fetches the latest from 3 sources I list (RSS feeds, hacker news, a Slack channel), writes a 200-word brief, saves to ~/digest/YYYY-MM-DD.md. Read-only. Skip the day if nothing new."
"Create .claude/agents/pr-reviewer.md. For a given PR URL, fetches the diff, runs through our team checklist (edge cases, tests, naming, performance), and produces a comment-ready review in markdown. Posts to GitHub only if I confirm with post=true."
"Create .claude/agents/cal-coordinator.md. Reads my calendar for the next 7 days, finds open 90-minute blocks, drafts proposed deep-work sessions matching priorities I list. Flags any conflicts with existing events. Outputs a plan — does not modify the calendar."
"Adapt the research-brief agent into .claude/agents/research-companion.md. Maintains a topics list. Daily, picks the topic with the oldest brief and refreshes it. Saves history so I see what changed week over week."
1. What's the goal of your agent?
2. Show the trace of one run.
3. What's one thing it got wrong, and how did you fix it?
4. What would you build next on top of it?
Take This Home
The agent loop, a subagent template, the tool selection chart, and the failure modes you'll see most often. Keep this open while you build.
Subagent file template
---
name: my-agent
description: One sentence — when to invoke this agent
tools: Read, Write, Bash, WebSearch
---
You are an agent that <goal>.
PLAN before acting. State your plan in one paragraph.
ACT one step at a time. After each tool call, ask yourself:
am I done?
STOP if you exceed 30 tool calls, 8 minutes, or hit anything
ambiguous.
NEVER:
- Write outside <allowed paths>
- Send external messages without explicit approval
- Continue past a destructive action without confirmation
Tool selection chart
| Need | Tool | Approval needed? |
|---|---|---|
| Read project files | Read, Glob |
No |
| Modify project files | Write, Edit |
Hook-enforced scope |
| Run shell commands | Bash |
Yes — high blast radius |
| Read the web | WebFetch, WebSearch |
No |
| Talk to GitHub / Slack / Cal | MCP server |
Per-action — read free, write gated |
| Custom internal API | Agent SDK custom tool |
Define per call |
Common failure modes & fixes
| Symptom | Likely cause | Fix |
|---|---|---|
| Agent loops forever | No stopping condition | Add max-steps + explicit "done" criteria |
| Agent fabricates results | Tool failed silently | Force the agent to print tool output before claiming success |
| Agent over-edits | Scope too broad | Tighten path allow-list, add a PreToolUse hook |
| Slow / expensive runs | Too many web calls | Cap searches, cache fetched URLs in a temp file |
| Inconsistent output format | Output spec implicit | Add a literal example of the expected output to the system prompt |
Quick command reference
| Task | How to do it |
|---|---|
| Create a subagent | "Create .claude/agents/<name>.md that …" |
| List agents in this project | /agents in chat |
| Install an MCP server | claude mcp add <name> -- <launch command> |
| Read a run trace | cat ~/.claude/projects/<project>/<run>.jsonl | jq |
| Add a guardrail hook | Edit .claude/settings.json · PreToolUse matcher + command |
| Run dry-first | Pass execute=false (or omit it if your agent defaults to dry-run) |
10 Agents to Build at Home
Once you've shipped your capstone, these are the next ten to build. Each one is one prompt away — the patterns repeat.
receipt-sorter
Reads PDFs in ~/Receipts, extracts amounts, builds a monthly CSV
standup-writer
Reads your last 24h of commits + PRs, drafts a Slack standup post
bookmark-librarian
Periodically de-dupes your browser bookmarks and tags by topic
deal-alert
Watches a list of products, pings you when prices drop below targets
code-archaeology
Given a function, traces its callers and writes a one-pager on its history
meeting-prep
For tomorrow's calendar, drafts 3-bullet prep notes per meeting
inbox-archivist
Auto-archives newsletters, labels promos, surfaces only personal mail
doc-gardener
Scans your repo for stale README/docs, opens PRs with refresh suggestions
newsletter-drafter
Turns a week of your tweets/posts into a newsletter draft
onboarding-buddy
For a new repo you cloned, writes a 1-page "where things live" guide
The files in .claude/agents/ live on disk. Commit them alongside your codebase. Share them across your team. Treat them like any other source file — diff, review, refactor.
Key Takeaways
Every agent runs the same loop: Goal → Plan → Act → Observe. If your "agent" isn't doing all four, it's a skill in disguise — make it one.
Subagents are plain Markdown files in .claude/agents/. YAML frontmatter (name, description, tools allow-list) plus a system prompt. Tighten the allow-list — less is safer.
Plan-then-execute beats raw improvisation. Force the agent to write its plan before any tool call, and to self-review before returning. Fix the agent, not the output.
Hooks enforce what the system prompt can only request. Use them for path scope, secret-scrubbing, and human approval gates on anything destructive or expensive.
Read the trace. Every run leaves a JSONL file under ~/.claude/projects/. When something misfires, the trace is your debugger.
Now go ship one.
Download the full workshop guide PDF and run the six modules end-to-end.
Download Workshop PDF