how-to 11 min read May 04, 2026

How to Build Subagents with the Claude Agent SDK

Q: Can a subagent spawn its own subagents?

No. The Claude Agent SDK documentation states explicitly that subagents cannot spawn their own subagents, and you should not include the Agent tool in a subagent's tools array.

Q: Why isn't Claude using my subagent?

Common causes: the Agent tool is missing from allowedTools, the subagent's description is vague, or the parent prompt does not hint at the subagent's domain. Fix by adding Agent, sharpening the description, or invoking the subagent by name.

Q: What model should I use for a subagent?

Match the model to the task. Haiku for triage and classification, Sonnet for general research, Opus for complex synthesis. The model field accepts 'haiku', 'sonnet', 'opus', 'inherit', or a full model ID.

By Peter Foy

Learn when and how to spawn subagents in the Claude Agent SDK. API reference, code examples, token savings, failure handling, and a content-research example.

TL;DR

A subagent in the Claude Agent SDK is a separate agent instance with its own fresh context window, system prompt, and tool restrictions. Spawn one when a task needs isolated context, parallel execution, or specialized instructions. Define subagents via the `agents` parameter in `query()` options, include `Agent` in `allowedTools`, and pass everything the subagent needs through its prompt string -- it inherits nothing else from the parent.

Subagents run in isolated context windows. Only the final message returns to the parent.
Include `Agent` in `allowedTools` or Claude cannot invoke any subagent.
Pass file paths, errors, and decisions inline in the prompt -- subagents do not inherit parent history.
Anthropic's multi-agent system beat single-agent Claude Opus 4 by 90.2% on internal research evals.
Subagents cannot spawn their own subagents. Keep the orchestration tree one level deep.

A subagent in the Claude Agent SDK is a separate agent instance your main agent can spawn to handle a focused subtask in an isolated context window. You define subagents through the agents parameter on query(), give each one a description, system prompt, and tool whitelist, then let Claude delegate via the Agent tool. Used right, subagents cut parent context usage by 60-80% on research-heavy workflows. Used wrong, they 15x your token bill for no quality gain. This guide covers when to spawn one, the exact SDK API, how to pass state, and how to handle failures.

What is a subagent in the Claude Agent SDK?

A subagent is a separate agent instance with its own fresh conversation history, its own system prompt, and its own tool whitelist. The parent invokes it through the built-in Agent tool, the subagent runs to completion, and only its final message returns to the parent context.

From the official Claude Agent SDK docs: "Each subagent runs in its own fresh conversation. Intermediate tool calls and results stay inside the subagent; only its final message returns to the parent."

Three properties define a subagent:

Isolated context -- a clean window with no parent conversation history
Specialized prompt -- a tailored system prompt with task-specific expertise
Restricted tools -- a whitelist that limits what the subagent can do

This isolation is the whole point. A research subagent can read 40 files, evaluate them, and return a 200-word summary. The parent never sees the 40 files. That is how you keep long-running agents under the 200K (or 1M with Sonnet 4.6) context limit.

When should you spawn a subagent vs handle it inline?

Spawn a subagent when the task involves sifting through more information than the parent needs to remember, when two tasks can run in parallel, or when the work needs a different system prompt than the parent. Handle it inline when the task is short, sequential, or produces output the parent will reference repeatedly.

Anthropic's own multi-agent research system beat single-agent Claude Opus 4 by 90.2% on their internal research eval -- but at roughly 15x the token cost. Subagents are not free. They are a quality-and-speed lever you pull when the task value justifies the spend.

Spawn a subagent when:

The task will read or generate >5K tokens of intermediate work the parent does not need
You can run 2+ independent tasks at the same time (style check, security scan, test coverage)
The subtask needs a different model (cheap Haiku for triage, Opus for final synthesis)
You want to enforce tool restrictions (read-only research agent that physically cannot write)

Keep it inline when:

The task is one or two tool calls
The parent needs the full intermediate output for downstream reasoning
Latency matters more than context cleanliness (subagent invocation adds round-trip overhead)

Multi-Agent vs Single-Agent Performance (Anthropic Research Eval)

Single-agent Claude Opus 4

100 (relative score)

Multi-agent (Opus 4 + Sonnet 4 subagents)

190.2 (relative score)

Source: Anthropic, How we built our multi-agent research system (2025)

How do you define a subagent in the SDK?

Define subagents through the agents parameter on ClaudeAgentOptions. Each entry maps a name to an AgentDefinition with at minimum a description and a prompt. The Agent tool must be in allowed_tools or Claude cannot invoke anything.

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition

async def main():
    async for message in query(
        prompt="Research the top 5 AEO tools cited by ChatGPT in 2026",
        options=ClaudeAgentOptions(
            allowed_tools=["WebSearch", "WebFetch", "Agent"],
            agents={
                "content-researcher": AgentDefinition(
                    description="Searches the web, reads sources, and returns a structured summary with citations. Use for any topic research before writing.",
                    prompt="""You are a research specialist. For each query:
1. Run 3-5 web searches with varied phrasing
2. Fetch the top 2-3 results per search
3. Extract: claim, source URL, publish date, supporting numbers
4. Return ONLY a JSON array of {claim, source, date, stat} objects.
Do not editorialize. Do not summarize the topic. Return raw, citable facts.""",
                    tools=["WebSearch", "WebFetch"],
                    model="sonnet",
                    max_turns=15,
                )
            },
        ),
    ):
        if hasattr(message, "result"):
            print(message.result)

asyncio.run(main())

Key AgentDefinition fields from the SDK reference:

Field	Required	Purpose
`description`	Yes	Tells Claude when to invoke this subagent
`prompt`	Yes	The subagent's system prompt
`tools`	No	Tool whitelist. Omit to inherit all parent tools
`model`	No	`"sonnet"`, `"opus"`, `"haiku"`, `"inherit"`, or full ID
`max_turns`	No	Hard cap on agentic turns before the subagent stops
`permission_mode`	No	Controls tool execution permissions inside the subagent

Constraint to remember: the docs are explicit -- "Subagents cannot spawn their own subagents. Don't include Agent in a subagent's tools array." Keep the orchestration tree one level deep.

How do you pass data between parent agent and subagent?

You pass data through one channel only: the prompt string the parent sends when invoking the Agent tool. The subagent does not inherit parent conversation history, parent tool results, or the parent's system prompt. Whatever the subagent needs -- file paths, error messages, schema, prior decisions -- goes inline in that prompt.

From the Claude SDK docs: "A subagent's context window starts fresh (no parent conversation) but isn't empty. The only channel from parent to subagent is the Agent tool's prompt string."

What the subagent receives:

Its own system prompt (AgentDefinition.prompt)
The parent's invocation prompt (the Agent tool input)
Project CLAUDE.md if settingSources includes it
Tool definitions for tools listed in tools

What the subagent does NOT receive:

Parent conversation history or tool results
Parent system prompt
Skills (unless explicitly listed in AgentDefinition.skills)

Returning data the other direction: the subagent's final message comes back to the parent verbatim as the Agent tool result. To structure that return, instruct the subagent to output JSON. Example: "Return ONLY a JSON array of {claim, source, date, stat} objects." The parent then parses it like any other tool result.

If you need verbatim subagent output to reach the end user without parent paraphrasing, add an explicit instruction in the main query() system prompt: "When you receive Agent tool results, pass them through unchanged."

How do subagents save context window tokens?

Subagents save tokens by keeping intermediate work out of the parent context. A research task might involve 12 web searches and 30 page fetches. Inline, that fills 40K+ tokens of parent context. As a subagent, all 40K stays in the subagent's window, and the parent receives only the final 200-word summary.

Real-world numbers from community benchmarks report 40-70% token reductions on focused tasks when subagents replace inline tool loops, with higher savings on workflows that hit many tools or MCP servers.

A concrete trace -- a content-writer agent researching a 2,000-word article:

Approach	Parent tokens used	Total tokens	Quality
Inline research + writing	42,000	42,000	Baseline
Subagent research + parent writing	8,500	51,000	Equal or better

The parent context drops 80%, but total tokens go up because the subagent run is additive. This is the trade. You pay more dollars to keep the parent window clean for downstream reasoning. On a long-running orchestrator that calls the writer 20 times, the clean parent context is what prevents the agent from hitting the 200K context ceiling and forcing a compaction mid-task.

When the math breaks down: for one-shot tasks where the parent will not run again, inline is cheaper. Subagents pay off in long-lived orchestrators and parallel workloads.

Subagent vs Inline Token Usage on Research Tasks

Inline (single context)

42000 tokens

Subagent (isolated context)

8500 tokens

Source: Internal trace, content-research subagent vs inline tool loop on 12-source research task

What happens when a subagent fails?

When a subagent fails, the parent receives an Agent tool result containing the failure -- usually an error message, a partial output, or a max_turns exceeded notice. The parent then decides what to do: retry with a different prompt, fall back to inline handling, or surface the failure to the user. The SDK does not retry automatically.

Common failure modes and how to handle them:

max_turns exceeded -- subagent ran out of agentic turns before finishing. Either raise max_turns, narrow the task, or split it across two subagent calls.
Tool denied -- subagent tried to use a tool not in its tools whitelist. Audit the whitelist. Read-only agents commonly hit this when prompted to write.
Model returned malformed output -- if you required JSON, the subagent may return prose. Wrap the parent's parsing in a try/except and re-invoke with a stricter prompt.
Context window overflow inside the subagent -- rare with 200K+ models, but possible on heavy file-reading tasks. Use compact or reduce the input set.

Defensive patterns:

try:
    result = json.loads(subagent_output)
except json.JSONDecodeError:
    # Re-invoke with stricter formatting instruction
    result = await retry_with_format_reminder(subagent_output)

For critical paths, wrap subagent invocations in your orchestrator's retry logic. The SDK exposes parent_tool_use_id on every message, so you can detect and log failures inside a subagent's execution context. From the docs: "Messages from within a subagent's context include a parent_tool_use_id field." Use it for tracing.

How do you keep the parent context window clean?

Keep the parent context clean by routing every research, exploration, or large-file task to a subagent, returning only structured summaries to the parent, and using compact for the parent itself when the conversation is unavoidably long.

Five rules from production agents:

One subagent per discrete subtask. Do not stuff three jobs into one subagent prompt -- split them so each runs in its own clean window.
Force structured returns. Tell the subagent to return JSON, a markdown table, or a fixed-format summary. Free-form prose bloats the parent.
Cap max_turns. A research agent should rarely need more than 15 turns. Cap it so a runaway loop cannot drain budget.
Use cheaper models for triage subagents. A Haiku-powered classifier subagent costs 1/10th of an Opus parent and rejects 80% of irrelevant work before it touches the main agent.
Log parent_tool_use_id. When something goes wrong, you need to know whether the bad output came from the parent or a subagent. Log it from day one.

The Anthropic engineering team's principle on this is direct: "Subagents are ideal for tasks that require sifting through large amounts of information where most of it won't be useful." If most of the intermediate work will not be useful to downstream reasoning, do not let it touch the parent context.

Real example: a content-research subagent spawned from a writer agent

Here is the exact pattern used in a production content-writer agent: the parent orchestrates the article, the content-researcher subagent does the web work, and the parent only sees a citations list.

from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition

WRITER_OPTIONS = ClaudeAgentOptions(
    allowed_tools=["Read", "Write", "Agent"],
    agents={
        "content-researcher": AgentDefinition(
            description=(
                "Use BEFORE writing to gather citable facts on a topic. "
                "Returns a JSON array of {claim, source_url, date, stat}."
            ),
            prompt=(
                "You are a research specialist. Run 3-5 web searches with "
                "varied phrasing. Fetch the top results. Extract claims with "
                "source URL, publish date, and a supporting statistic where "
                "available. Return ONLY a JSON array. Do not write prose."
            ),
            tools=["WebSearch", "WebFetch"],
            model="sonnet",
            max_turns=15,
        ),
        "fact-checker": AgentDefinition(
            description="Use AFTER drafting to verify each statistic against its cited URL.",
            prompt="For each claim, fetch the source URL and confirm the stat appears verbatim. Return a JSON list of {claim, verified: bool, note}.",
            tools=["WebFetch"],
            model="haiku",
            max_turns=10,
        ),
    },
)

async for message in query(
    prompt="Write a 1500-word article on AEO citation rates. Use the content-researcher agent first, then draft, then run fact-checker on the draft.",
    options=WRITER_OPTIONS,
):
    ...

What the trace shows:

Parent prompt + system: ~3,200 tokens
content-researcher subagent: 38,000 tokens internal, returns 1,800-token JSON
Parent drafts the article using only the JSON: +6,500 tokens
fact-checker subagent: 12,000 tokens internal, returns 400-token verdict
Final parent context: ~12,000 tokens instead of ~50,000+

The parent stays well under context limits across many article drafts. The two subagents handle the heavy lifting in disposable windows. This is the pattern Anthropic's multi-agent research system uses, just scaled down for a single-author content workflow.

Pattern	Parent context cost	Total token cost	Best for
Inline tool loop	High (all tool results stay)	Lowest	Short, sequential tasks
Single subagent	Low (only final message)	+15-30%	Research, exploration, large file reads
Parallel subagents	Low (only final messages)	+30-100%	Independent checks (style, security, tests)
Triage subagent (Haiku)	Very low	Often net negative	Filtering irrelevant work before main agent

Frequently asked questions

What is a subagent in the Claude Agent SDK?

A subagent is a separate Claude agent instance spawned by a parent agent through the Agent tool. It runs in an isolated context window with its own system prompt and tool whitelist. Only its final message returns to the parent, which keeps the parent's context window clean.

Can a subagent spawn its own subagents?

No. The official Claude Agent SDK documentation states explicitly that subagents cannot spawn their own subagents, and you should not include the Agent tool in a subagent's tools array. Keep your orchestration tree one level deep.

Do subagents inherit the parent agent's conversation history?

No. A subagent starts with a fresh context window. It receives its own system prompt, the parent's invocation prompt string, project CLAUDE.md (if loaded), and tool definitions. It does not see parent conversation history, parent tool results, or the parent's system prompt.

How much do subagents reduce token usage?

Community benchmarks report 40-70% reductions in parent context token usage on focused, tool-heavy tasks. Total token spend often goes up because the subagent run is additive, but parent context stays clean -- which matters most for long-running orchestrators.

Why isn't Claude using my subagent?

Three common causes: (1) the Agent tool is missing from allowedTools, so Claude cannot invoke any subagent; (2) the subagent's description is vague, so Claude does not match tasks to it; (3) the parent prompt does not hint at the subagent's domain. Fix by adding Agent, sharpening the description, or invoking the subagent by name in the prompt.

Can I run multiple subagents in parallel?

Yes. The Claude Agent SDK supports parallel subagent execution. A single parent turn can dispatch multiple Agent tool calls that run concurrently. This is how Anthropic's multi-agent research system parallelizes web research across worker subagents.

What model should I use for a subagent?

Match the model to the task. Use Haiku for triage, classification, and simple research. Use Sonnet for general research and analysis. Use Opus for complex synthesis or high-stakes review. The model field on AgentDefinition accepts 'haiku', 'sonnet', 'opus', 'inherit', or a full model ID.

How do I debug what happened inside a subagent?

Every message originating inside a subagent's execution includes a parent_tool_use_id field. Filter your message stream on this field to extract the subagent's full trace. The Agent tool result returned to the parent contains the subagent's final message and any error.

What's the difference between subagents and skills in the Claude SDK?

Skills are reusable knowledge files (markdown) that augment a single agent's capabilities. Subagents are separate agent instances with their own context windows. Skills add what an agent knows; subagents add isolated execution. They are complementary -- a subagent can be configured with its own skills via the skills field on AgentDefinition.

After the real example section, position as: 'Want to see a production multi-agent stack built with these patterns?'

Get the Growth Engineer agent stack