A subagent in the Claude Agent SDK is a separate agent instance your main agent can spawn to handle a focused subtask in an isolated context window. You define subagents through the agents parameter on query(), give each one a description, system prompt, and tool whitelist, then let Claude delegate via the Agent tool. Used right, subagents cut parent context usage by 60-80% on research-heavy workflows. Used wrong, they 15x your token bill for no quality gain. This guide covers when to spawn one, the exact SDK API, how to pass state, and how to handle failures.
What is a subagent in the Claude Agent SDK?
A subagent is a separate agent instance with its own fresh conversation history, its own system prompt, and its own tool whitelist. The parent invokes it through the built-in Agent tool, the subagent runs to completion, and only its final message returns to the parent context.
From the official Claude Agent SDK docs: "Each subagent runs in its own fresh conversation. Intermediate tool calls and results stay inside the subagent; only its final message returns to the parent."
Three properties define a subagent:
- Isolated context -- a clean window with no parent conversation history
- Specialized prompt -- a tailored system prompt with task-specific expertise
- Restricted tools -- a whitelist that limits what the subagent can do
This isolation is the whole point. A research subagent can read 40 files, evaluate them, and return a 200-word summary. The parent never sees the 40 files. That is how you keep long-running agents under the 200K (or 1M with Sonnet 4.6) context limit.
When should you spawn a subagent vs handle it inline?
Spawn a subagent when the task involves sifting through more information than the parent needs to remember, when two tasks can run in parallel, or when the work needs a different system prompt than the parent. Handle it inline when the task is short, sequential, or produces output the parent will reference repeatedly.
Anthropic's own multi-agent research system beat single-agent Claude Opus 4 by 90.2% on their internal research eval -- but at roughly 15x the token cost. Subagents are not free. They are a quality-and-speed lever you pull when the task value justifies the spend.
Spawn a subagent when:
- The task will read or generate >5K tokens of intermediate work the parent does not need
- You can run 2+ independent tasks at the same time (style check, security scan, test coverage)
- The subtask needs a different model (cheap Haiku for triage, Opus for final synthesis)
- You want to enforce tool restrictions (read-only research agent that physically cannot write)
Keep it inline when:
- The task is one or two tool calls
- The parent needs the full intermediate output for downstream reasoning
- Latency matters more than context cleanliness (subagent invocation adds round-trip overhead)
How do you define a subagent in the SDK?
Define subagents through the agents parameter on ClaudeAgentOptions. Each entry maps a name to an AgentDefinition with at minimum a description and a prompt. The Agent tool must be in allowed_tools or Claude cannot invoke anything.
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition
async def main():
async for message in query(
prompt="Research the top 5 AEO tools cited by ChatGPT in 2026",
options=ClaudeAgentOptions(
allowed_tools=["WebSearch", "WebFetch", "Agent"],
agents={
"content-researcher": AgentDefinition(
description="Searches the web, reads sources, and returns a structured summary with citations. Use for any topic research before writing.",
prompt="""You are a research specialist. For each query:
1. Run 3-5 web searches with varied phrasing
2. Fetch the top 2-3 results per search
3. Extract: claim, source URL, publish date, supporting numbers
4. Return ONLY a JSON array of {claim, source, date, stat} objects.
Do not editorialize. Do not summarize the topic. Return raw, citable facts.""",
tools=["WebSearch", "WebFetch"],
model="sonnet",
max_turns=15,
)
},
),
):
if hasattr(message, "result"):
print(message.result)
asyncio.run(main())
Key AgentDefinition fields from the SDK reference:
| Field | Required | Purpose |
|---|---|---|
description |
Yes | Tells Claude when to invoke this subagent |
prompt |
Yes | The subagent's system prompt |
tools |
No | Tool whitelist. Omit to inherit all parent tools |
model |
No | "sonnet", "opus", "haiku", "inherit", or full ID |
max_turns |
No | Hard cap on agentic turns before the subagent stops |
permission_mode |
No | Controls tool execution permissions inside the subagent |
Constraint to remember: the docs are explicit -- "Subagents cannot spawn their own subagents. Don't include Agent in a subagent's tools array." Keep the orchestration tree one level deep.
How do you pass data between parent agent and subagent?
You pass data through one channel only: the prompt string the parent sends when invoking the Agent tool. The subagent does not inherit parent conversation history, parent tool results, or the parent's system prompt. Whatever the subagent needs -- file paths, error messages, schema, prior decisions -- goes inline in that prompt.
From the Claude SDK docs: "A subagent's context window starts fresh (no parent conversation) but isn't empty. The only channel from parent to subagent is the Agent tool's prompt string."
What the subagent receives:
- Its own system prompt (
AgentDefinition.prompt) - The parent's invocation prompt (the
Agenttool input) - Project
CLAUDE.mdifsettingSourcesincludes it - Tool definitions for tools listed in
tools
What the subagent does NOT receive:
- Parent conversation history or tool results
- Parent system prompt
- Skills (unless explicitly listed in
AgentDefinition.skills)
Returning data the other direction: the subagent's final message comes back to the parent verbatim as the Agent tool result. To structure that return, instruct the subagent to output JSON. Example: "Return ONLY a JSON array of {claim, source, date, stat} objects." The parent then parses it like any other tool result.
If you need verbatim subagent output to reach the end user without parent paraphrasing, add an explicit instruction in the main query() system prompt: "When you receive Agent tool results, pass them through unchanged."
How do subagents save context window tokens?
Subagents save tokens by keeping intermediate work out of the parent context. A research task might involve 12 web searches and 30 page fetches. Inline, that fills 40K+ tokens of parent context. As a subagent, all 40K stays in the subagent's window, and the parent receives only the final 200-word summary.
Real-world numbers from community benchmarks report 40-70% token reductions on focused tasks when subagents replace inline tool loops, with higher savings on workflows that hit many tools or MCP servers.
A concrete trace -- a content-writer agent researching a 2,000-word article:
| Approach | Parent tokens used | Total tokens | Quality |
|---|---|---|---|
| Inline research + writing | 42,000 | 42,000 | Baseline |
| Subagent research + parent writing | 8,500 | 51,000 | Equal or better |
The parent context drops 80%, but total tokens go up because the subagent run is additive. This is the trade. You pay more dollars to keep the parent window clean for downstream reasoning. On a long-running orchestrator that calls the writer 20 times, the clean parent context is what prevents the agent from hitting the 200K context ceiling and forcing a compaction mid-task.
When the math breaks down: for one-shot tasks where the parent will not run again, inline is cheaper. Subagents pay off in long-lived orchestrators and parallel workloads.
What happens when a subagent fails?
When a subagent fails, the parent receives an Agent tool result containing the failure -- usually an error message, a partial output, or a max_turns exceeded notice. The parent then decides what to do: retry with a different prompt, fall back to inline handling, or surface the failure to the user. The SDK does not retry automatically.
Common failure modes and how to handle them:
max_turnsexceeded -- subagent ran out of agentic turns before finishing. Either raisemax_turns, narrow the task, or split it across two subagent calls.- Tool denied -- subagent tried to use a tool not in its
toolswhitelist. Audit the whitelist. Read-only agents commonly hit this when prompted to write. - Model returned malformed output -- if you required JSON, the subagent may return prose. Wrap the parent's parsing in a try/except and re-invoke with a stricter prompt.
- Context window overflow inside the subagent -- rare with 200K+ models, but possible on heavy file-reading tasks. Use
compactor reduce the input set.
Defensive patterns:
try:
result = json.loads(subagent_output)
except json.JSONDecodeError:
# Re-invoke with stricter formatting instruction
result = await retry_with_format_reminder(subagent_output)
For critical paths, wrap subagent invocations in your orchestrator's retry logic. The SDK exposes parent_tool_use_id on every message, so you can detect and log failures inside a subagent's execution context. From the docs: "Messages from within a subagent's context include a parent_tool_use_id field." Use it for tracing.
How do you keep the parent context window clean?
Keep the parent context clean by routing every research, exploration, or large-file task to a subagent, returning only structured summaries to the parent, and using compact for the parent itself when the conversation is unavoidably long.
Five rules from production agents:
- One subagent per discrete subtask. Do not stuff three jobs into one subagent prompt -- split them so each runs in its own clean window.
- Force structured returns. Tell the subagent to return JSON, a markdown table, or a fixed-format summary. Free-form prose bloats the parent.
- Cap
max_turns. A research agent should rarely need more than 15 turns. Cap it so a runaway loop cannot drain budget. - Use cheaper models for triage subagents. A Haiku-powered classifier subagent costs 1/10th of an Opus parent and rejects 80% of irrelevant work before it touches the main agent.
- Log
parent_tool_use_id. When something goes wrong, you need to know whether the bad output came from the parent or a subagent. Log it from day one.
The Anthropic engineering team's principle on this is direct: "Subagents are ideal for tasks that require sifting through large amounts of information where most of it won't be useful." If most of the intermediate work will not be useful to downstream reasoning, do not let it touch the parent context.
Real example: a content-research subagent spawned from a writer agent
Here is the exact pattern used in a production content-writer agent: the parent orchestrates the article, the content-researcher subagent does the web work, and the parent only sees a citations list.
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition
WRITER_OPTIONS = ClaudeAgentOptions(
allowed_tools=["Read", "Write", "Agent"],
agents={
"content-researcher": AgentDefinition(
description=(
"Use BEFORE writing to gather citable facts on a topic. "
"Returns a JSON array of {claim, source_url, date, stat}."
),
prompt=(
"You are a research specialist. Run 3-5 web searches with "
"varied phrasing. Fetch the top results. Extract claims with "
"source URL, publish date, and a supporting statistic where "
"available. Return ONLY a JSON array. Do not write prose."
),
tools=["WebSearch", "WebFetch"],
model="sonnet",
max_turns=15,
),
"fact-checker": AgentDefinition(
description="Use AFTER drafting to verify each statistic against its cited URL.",
prompt="For each claim, fetch the source URL and confirm the stat appears verbatim. Return a JSON list of {claim, verified: bool, note}.",
tools=["WebFetch"],
model="haiku",
max_turns=10,
),
},
)
async for message in query(
prompt="Write a 1500-word article on AEO citation rates. Use the content-researcher agent first, then draft, then run fact-checker on the draft.",
options=WRITER_OPTIONS,
):
...
What the trace shows:
- Parent prompt + system: ~3,200 tokens
content-researchersubagent: 38,000 tokens internal, returns 1,800-token JSON- Parent drafts the article using only the JSON: +6,500 tokens
fact-checkersubagent: 12,000 tokens internal, returns 400-token verdict- Final parent context: ~12,000 tokens instead of ~50,000+
The parent stays well under context limits across many article drafts. The two subagents handle the heavy lifting in disposable windows. This is the pattern Anthropic's multi-agent research system uses, just scaled down for a single-author content workflow.
| Pattern | Parent context cost | Total token cost | Best for |
|---|---|---|---|
| Inline tool loop | High (all tool results stay) | Lowest | Short, sequential tasks |
| Single subagent | Low (only final message) | +15-30% | Research, exploration, large file reads |
| Parallel subagents | Low (only final messages) | +30-100% | Independent checks (style, security, tests) |
| Triage subagent (Haiku) | Very low | Often net negative | Filtering irrelevant work before main agent |