claude-agent-sdk-basics
Claude Agent SDK Basics
Anthropic provides two official agent SDKs: one for Python and one for TypeScript. Both wrap the Claude API with agent-specific abstractions: tool use, multi-turn conversation loops, agent handoffs, and guardrails. Use the SDK instead of raw API calls when building agents that need tool use, multi-step reasoning, or orchestration.
When to Use the SDK vs Raw API
| Scenario | Use SDK | Use raw API |
|---|---|---|
| Agent with tools (search, CRM read/write, enrichment) | Yes | No |
| Multi-turn conversation loop (agent reasons, acts, observes, repeats) | Yes | No |
| Simple single-turn generation (write one email from a prompt) | No | Yes |
| Multi-agent orchestration with handoffs | Yes | No |
| Batch processing with no tool use (score 1,000 leads from CSV) | No | Yes (with Batch API) |
| Streaming responses to a UI | Either | Either |
Rule of thumb: If the agent needs to take actions (call tools, make decisions, loop), use the SDK. If you're generating a single output from a single prompt, the raw API is simpler.
Python SDK Setup
Installation
pip install anthropic-agent
Minimum viable agent
from anthropic_agent import Agent, tool
@tool
def search_company(company_name: str) -> str:
"""Search for company information by name."""
# Your implementation: call Crunchbase, LinkedIn, etc.
return f"Company data for {company_name}"
agent = Agent(
model="claude-sonnet-4-6",
system_prompt="""You are a B2B account research agent that produces
structured account briefs from company names. Accuracy matters more
than completeness. Never guess. If data is not found, say so.""",
tools=[search_company],
)
result = agent.run("Research Acme Corp for outbound targeting")
print(result)
TypeScript SDK Setup
npm install @anthropic-ai/agent
import { Agent, tool } from "@anthropic-ai/agent";
const searchCompany = tool({
name: "search_company",
description: "Search for company information by name.",
parameters: {
company_name: { type: "string", description: "The target company name" },
},
execute: async ({ company_name }) => {
// Your implementation
return `Company data for ${company_name}`;
},
});
const agent = new Agent({
model: "claude-sonnet-4-6",
systemPrompt: `You are a B2B account research agent that produces
structured account briefs from company names. Accuracy matters more
than completeness. Never guess. If data is not found, say so.`,
tools: [searchCompany],
});
const result = await agent.run("Research Acme Corp for outbound targeting");
console.log(result);
Core Concepts
The Agent Loop
The SDK runs an agentic loop automatically:
User message → Model thinks → Model calls tool → Tool returns result →
Model thinks again → Model calls another tool (or responds) → ... → Final response
The loop continues until the model produces a final text response without calling any more tools. The SDK manages the conversation history, tool call/result threading, and retry logic.
Loop rules:
- Set a max_turns limit to prevent infinite loops. Default to 10-15 for research agents, 5 for simple agents
- Each turn consumes tokens. A 10-turn research loop on Opus costs roughly 10x a single-turn call. Budget accordingly
- The model decides when to stop looping. If it's stopping too early (not using all available tools), add explicit instructions in the system prompt: "Use all available tools before producing the final output"
- If the model loops without converging, the system prompt is usually too vague. Add clearer completion criteria
Tools
Tools are functions the agent can call. The SDK handles the tool call protocol: the model requests a tool call, the SDK executes the function, and the result is sent back to the model.
Tool design rules for GTM agents:
- Name tools descriptively.
search_crunchbaseis better thansearch.get_linkedin_contactsis better thanget_contacts. The model uses the tool name to decide when to call it - Write clear descriptions. The description tells the model when and how to use the tool. "Search Crunchbase for company funding history, investors, and founding date" is actionable. "Search for stuff" is not
- Define parameters with types and descriptions. Every parameter should have a type, a description, and whether it's required or optional
- Return structured data. Return JSON strings or structured objects, not raw HTML or API responses. The model handles structured data better than parsing raw markup
- Handle errors in the tool, not the prompt. If a search returns no results, the tool should return
{"status": "no_results", "query": "..."}, not throw an exception. The model can reason about a "no results" response. It can't reason about a crash
Example: well-designed tool
@tool
def search_linkedin_contacts(
company_name: str,
title_filter: str = "",
max_results: int = 5
) -> str:
"""Search LinkedIn for contacts at a specific company.
Args:
company_name: The company to search for contacts at.
title_filter: Optional title keyword to filter results
(e.g., "VP Sales", "RevOps").
max_results: Maximum number of contacts to return. Default 5.
Returns:
JSON array of contacts with name, title, linkedin_url, tenure_months.
Returns {"status": "no_results"} if no contacts found.
"""
# Implementation here
...
Structured Output
Force the model to return output in a specific schema. Critical for agents whose output feeds into other systems (CRM, downstream agents, dashboards).
from pydantic import BaseModel
from typing import Optional
class AccountBrief(BaseModel):
company_name: str
founded: Optional[str]
hq: Optional[str]
employee_count: Optional[str]
funding: Optional[str]
industry: str
signals: list[dict]
problem_hypothesis: str
confidence: str # "high", "medium", "low"
missing_fields: list[str]
agent = Agent(
model="claude-sonnet-4-6",
system_prompt="...",
tools=[search_company, search_linkedin],
output_schema=AccountBrief,
)
result = agent.run("Research Acme Corp")
# result is a validated AccountBrief instance
Structured output rules:
- Use structured output whenever the agent's result feeds into another system. CRM writes, downstream agents, dashboards, CSV exports
- Define all fields as Optional or provide defaults. The model may not find all data. Non-optional fields with no data cause validation errors
- Include a
missing_fieldsorconfidencefield. The agent should communicate what it doesn't know - Test the schema with edge cases. A company with no funding data, a company with no LinkedIn presence, a company that was acquired. Make sure the schema handles all of them
Model Selection
| Agent type | Recommended model | Why |
|---|---|---|
| Research agent (multi-tool, synthesis) | claude-sonnet-4-6 | Good balance of reasoning, tool use, and cost |
| Email writer (creative, quality-critical) | claude-sonnet-4-6 or claude-opus-4-6 | Quality matters. Opus for highest-stakes output |
| Reply classifier (fast, high-volume) | claude-haiku-4-5 | Speed and cost. Classification doesn't need Opus |
| QA / critic agent | claude-sonnet-4-6 | Rule-checking needs accuracy, not creativity |
| Router agent | claude-haiku-4-5 | Fast classification. Cheap. Runs on every input |
| Data extraction / enrichment | claude-sonnet-4-6 | Accuracy matters. Moderate reasoning needed |
Model selection rules:
- Default to Sonnet for most agents. Move to Opus only for tasks where quality difference is measurable and worth the cost (roughly 5x more expensive)
- Use Haiku for classification, routing, and any high-volume low-complexity task
- Test the same prompt on two models before committing. Sometimes Sonnet matches Opus on a specific task, and sometimes Haiku matches Sonnet
- Cost per run matters at scale. A research agent that runs 1,000 times/day at $0.15/run on Opus costs $4,500/month. The same agent on Sonnet at $0.03/run costs $900/month. Measure quality difference before paying 5x
Multi-Agent with Handoffs
The SDK supports handing off between agents. One agent completes its work and passes context to the next.
research_agent = Agent(
model="claude-sonnet-4-6",
system_prompt="Research agent prompt...",
tools=[search_company, search_linkedin],
output_schema=AccountBrief,
)
email_agent = Agent(
model="claude-sonnet-4-6",
system_prompt="Email writer prompt...",
tools=[],
output_schema=EmailSequence,
)
# Pipeline: research → email writing
account_brief = research_agent.run(f"Research {company_name}")
email_sequence = email_agent.run(
f"Write a 3-email cold sequence for this account:\n\n"
f"{account_brief.model_dump_json()}"
)
Handoff rules:
- Pass structured data between agents, not free-form text. The output schema of Agent A should match the expected input format of Agent B
- Don't pass the entire conversation history from Agent A to Agent B. Pass only the final output. Agent B doesn't need Agent A's reasoning process
- Each agent should be independently runnable. Test Agent B with a manually-crafted input before wiring it to Agent A. This isolates bugs to the specific agent
- Log the output at every handoff point. When the pipeline produces bad output, you need to identify which agent failed
Guardrails
Guardrails prevent the agent from doing things it shouldn't: calling tools it doesn't have access to, producing output that violates rules, or running indefinitely.
Turn limits
agent = Agent(
model="claude-sonnet-4-6",
system_prompt="...",
tools=[...],
max_turns=10, # Stop after 10 tool-use turns
)
Set limits based on agent type:
- Research agents: 10-15 turns (multiple tool calls needed)
- Email writers: 3-5 turns (generate, maybe self-critique, done)
- Classifiers: 1-2 turns (classify and respond)
- QA agents: 2-3 turns (check rules, report)
Content filtering
Add a post-processing step to catch rule violations the model missed.
BANNED_PHRASES = [
"leveraging", "in today's fast-paced world", "best-in-class",
"holistic", "synergies", "unlock", "streamline",
]
def check_email_rules(email_text: str) -> list[str]:
"""Return list of rule violations found."""
violations = []
for phrase in BANNED_PHRASES:
if phrase.lower() in email_text.lower():
violations.append(f"Banned phrase: '{phrase}'")
if "—" in email_text:
violations.append("Contains em-dash")
word_count = len(email_text.split())
if word_count > 80:
violations.append(f"Over word limit: {word_count} words")
return violations
Guardrail rules:
- Programmatic checks are more reliable than prompt-based rules. The model may violate a prompt rule 5% of the time. A regex check catches it 100% of the time. Use both
- Run guardrails on the final output, not intermediate steps. Intermediate tool calls may contain content that would violate output rules (e.g., a search result containing "best-in-class"). Only check the final customer-facing output
- When a guardrail catches a violation, feed it back to the agent for revision. Don't just flag it. Let the agent fix it (with a retry cap of 2-3)
Cost Management
| Lever | How | Impact |
|---|---|---|
| Model selection | Use Haiku for classification, Sonnet for most agents | 3-5x cost reduction vs Opus everywhere |
| Turn limits | Cap max_turns per agent type | Prevents runaway loops |
| Prompt caching | Enable caching for system prompts (automatic with SDK) | 90% reduction on cached prompt tokens |
| Structured output | Use output_schema to prevent verbose responses | 20-40% fewer output tokens |
| Tool result truncation | Limit tool return size (e.g., first 1,000 chars of search results) | Fewer input tokens per turn |
| Batch processing | Use the Batch API for non-real-time tasks | 50% cost reduction on batch-eligible tasks |
Cost tracking rules:
- Log token usage per agent per run from Day 1. You can't optimize what you don't measure
- Calculate cost per unit of work: "cost per account brief", "cost per email sequence", "cost per lead scored." This makes cost legible to non-technical stakeholders
- Set budget alerts. A runaway agent calling tools in a loop can burn through $100+ in a single session. Max_turns prevents this, but budget alerts catch edge cases
- Compare agent cost to human cost. If an account brief costs $0.15 to generate and saves 45 minutes of SDR time, the economics are clear. If it costs $2.00 and saves 5 minutes, reconsider
Common GTM Agent Recipes
Recipe 1: Account research pipeline
Research Agent (Sonnet, 10 turns, 3 tools)
→ AccountBrief schema
→ QA Agent (Sonnet, 2 turns, no tools)
→ Validated AccountBrief
Recipe 2: Cold email generation with QA loop
Email Writer (Sonnet, 5 turns, no tools)
→ EmailSequence schema
→ Email Critic (Sonnet, 2 turns, no tools)
→ Pass? → Output
→ Fail? → Feedback → Email Writer (retry, max 3)
Recipe 3: Inbound reply router
Reply Classifier (Haiku, 1 turn, no tools)
→ Classification + confidence
→ If positive: Meeting Booker Agent (Sonnet)
→ If objection: Objection Handler Agent (Sonnet)
→ If OOO: OOO Parser Agent (Haiku)
→ If opt-out: CRM Updater (programmatic, no LLM)
→ If low confidence: → Human review queue
Recipe 4: Signal monitor (daily batch)
For each target account:
Signal Scanner (Haiku, 3 turns, 2 tools: web search + news search)
→ SignalList schema
→ If new signals found: Alert via Slack + update CRM
→ If no signals: Skip
Anti-Pattern Check
- Building agents without the SDK. Raw API tool-use loops are error-prone and require reimplementing conversation management, tool call threading, and retry logic. Use the SDK unless the agent has no tools
- No max_turns. A research agent without a turn limit can loop indefinitely, burning tokens and producing increasingly irrelevant output. Always set a cap
- Using Opus for every agent. Opus is 5x the cost of Sonnet and often produces equivalent output for non-creative tasks. Default to Sonnet, upgrade to Opus only where quality measurably improves
- No structured output for pipeline agents. If Agent A's output feeds Agent B, free-form text between them causes parsing failures. Use output_schema
- Tools that return raw HTML. The model will spend tokens parsing markup instead of reasoning about content. Parse in the tool, return structured data
- No cost tracking. Token costs are invisible until the invoice arrives. Log usage per agent per run from the start
- No guardrails on customer-facing output. Prompt rules are probabilistic. Programmatic checks are deterministic. Use both. A banned-phrase regex catches what the model misses
- Skipping prompt caching. System prompts for agents are often 1,000-2,000 tokens. Caching them reduces cost by 90% on those tokens. The SDK enables this automatically, but verify it's working