general claude-agent-sdk-basics

claude-agent-sdk-basics

This skill should be used when the user asks to "build an agent with the Claude SDK", "use the Claude Agent SDK", "create a Claude agent", "get started with the Anthropic agent SDK", "build an agentic app with Claude", "set up the Claude agent SDK", "write an agent with tool use", "build an agent in Python with Claude", "build an agent in TypeScript with Claude", or any variation of getting started with Anthropic's agent SDKs to build AI agents for B2B SaaS use cases.
Download .md

Claude Agent SDK Basics

Anthropic provides two official agent SDKs: one for Python and one for TypeScript. Both wrap the Claude API with agent-specific abstractions: tool use, multi-turn conversation loops, agent handoffs, and guardrails. Use the SDK instead of raw API calls when building agents that need tool use, multi-step reasoning, or orchestration.

When to Use the SDK vs Raw API

Scenario Use SDK Use raw API
Agent with tools (search, CRM read/write, enrichment) Yes No
Multi-turn conversation loop (agent reasons, acts, observes, repeats) Yes No
Simple single-turn generation (write one email from a prompt) No Yes
Multi-agent orchestration with handoffs Yes No
Batch processing with no tool use (score 1,000 leads from CSV) No Yes (with Batch API)
Streaming responses to a UI Either Either

Rule of thumb: If the agent needs to take actions (call tools, make decisions, loop), use the SDK. If you're generating a single output from a single prompt, the raw API is simpler.


Python SDK Setup

Installation

pip install anthropic-agent

Minimum viable agent

from anthropic_agent import Agent, tool

@tool
def search_company(company_name: str) -> str:
    """Search for company information by name."""
    # Your implementation: call Crunchbase, LinkedIn, etc.
    return f"Company data for {company_name}"

agent = Agent(
    model="claude-sonnet-4-6",
    system_prompt="""You are a B2B account research agent that produces
    structured account briefs from company names. Accuracy matters more
    than completeness. Never guess. If data is not found, say so.""",
    tools=[search_company],
)

result = agent.run("Research Acme Corp for outbound targeting")
print(result)

TypeScript SDK Setup

npm install @anthropic-ai/agent
import { Agent, tool } from "@anthropic-ai/agent";

const searchCompany = tool({
  name: "search_company",
  description: "Search for company information by name.",
  parameters: {
    company_name: { type: "string", description: "The target company name" },
  },
  execute: async ({ company_name }) => {
    // Your implementation
    return `Company data for ${company_name}`;
  },
});

const agent = new Agent({
  model: "claude-sonnet-4-6",
  systemPrompt: `You are a B2B account research agent that produces
    structured account briefs from company names. Accuracy matters more
    than completeness. Never guess. If data is not found, say so.`,
  tools: [searchCompany],
});

const result = await agent.run("Research Acme Corp for outbound targeting");
console.log(result);

Core Concepts

The Agent Loop

The SDK runs an agentic loop automatically:

User message → Model thinks → Model calls tool → Tool returns result →
Model thinks again → Model calls another tool (or responds) → ... → Final response

The loop continues until the model produces a final text response without calling any more tools. The SDK manages the conversation history, tool call/result threading, and retry logic.

Loop rules:

  • Set a max_turns limit to prevent infinite loops. Default to 10-15 for research agents, 5 for simple agents
  • Each turn consumes tokens. A 10-turn research loop on Opus costs roughly 10x a single-turn call. Budget accordingly
  • The model decides when to stop looping. If it's stopping too early (not using all available tools), add explicit instructions in the system prompt: "Use all available tools before producing the final output"
  • If the model loops without converging, the system prompt is usually too vague. Add clearer completion criteria

Tools

Tools are functions the agent can call. The SDK handles the tool call protocol: the model requests a tool call, the SDK executes the function, and the result is sent back to the model.

Tool design rules for GTM agents:

  • Name tools descriptively. search_crunchbase is better than search. get_linkedin_contacts is better than get_contacts. The model uses the tool name to decide when to call it
  • Write clear descriptions. The description tells the model when and how to use the tool. "Search Crunchbase for company funding history, investors, and founding date" is actionable. "Search for stuff" is not
  • Define parameters with types and descriptions. Every parameter should have a type, a description, and whether it's required or optional
  • Return structured data. Return JSON strings or structured objects, not raw HTML or API responses. The model handles structured data better than parsing raw markup
  • Handle errors in the tool, not the prompt. If a search returns no results, the tool should return {"status": "no_results", "query": "..."}, not throw an exception. The model can reason about a "no results" response. It can't reason about a crash

Example: well-designed tool

@tool
def search_linkedin_contacts(
    company_name: str,
    title_filter: str = "",
    max_results: int = 5
) -> str:
    """Search LinkedIn for contacts at a specific company.

    Args:
        company_name: The company to search for contacts at.
        title_filter: Optional title keyword to filter results
            (e.g., "VP Sales", "RevOps").
        max_results: Maximum number of contacts to return. Default 5.

    Returns:
        JSON array of contacts with name, title, linkedin_url, tenure_months.
        Returns {"status": "no_results"} if no contacts found.
    """
    # Implementation here
    ...

Structured Output

Force the model to return output in a specific schema. Critical for agents whose output feeds into other systems (CRM, downstream agents, dashboards).

from pydantic import BaseModel
from typing import Optional

class AccountBrief(BaseModel):
    company_name: str
    founded: Optional[str]
    hq: Optional[str]
    employee_count: Optional[str]
    funding: Optional[str]
    industry: str
    signals: list[dict]
    problem_hypothesis: str
    confidence: str  # "high", "medium", "low"
    missing_fields: list[str]

agent = Agent(
    model="claude-sonnet-4-6",
    system_prompt="...",
    tools=[search_company, search_linkedin],
    output_schema=AccountBrief,
)

result = agent.run("Research Acme Corp")
# result is a validated AccountBrief instance

Structured output rules:

  • Use structured output whenever the agent's result feeds into another system. CRM writes, downstream agents, dashboards, CSV exports
  • Define all fields as Optional or provide defaults. The model may not find all data. Non-optional fields with no data cause validation errors
  • Include a missing_fields or confidence field. The agent should communicate what it doesn't know
  • Test the schema with edge cases. A company with no funding data, a company with no LinkedIn presence, a company that was acquired. Make sure the schema handles all of them

Model Selection

Agent type Recommended model Why
Research agent (multi-tool, synthesis) claude-sonnet-4-6 Good balance of reasoning, tool use, and cost
Email writer (creative, quality-critical) claude-sonnet-4-6 or claude-opus-4-6 Quality matters. Opus for highest-stakes output
Reply classifier (fast, high-volume) claude-haiku-4-5 Speed and cost. Classification doesn't need Opus
QA / critic agent claude-sonnet-4-6 Rule-checking needs accuracy, not creativity
Router agent claude-haiku-4-5 Fast classification. Cheap. Runs on every input
Data extraction / enrichment claude-sonnet-4-6 Accuracy matters. Moderate reasoning needed

Model selection rules:

  • Default to Sonnet for most agents. Move to Opus only for tasks where quality difference is measurable and worth the cost (roughly 5x more expensive)
  • Use Haiku for classification, routing, and any high-volume low-complexity task
  • Test the same prompt on two models before committing. Sometimes Sonnet matches Opus on a specific task, and sometimes Haiku matches Sonnet
  • Cost per run matters at scale. A research agent that runs 1,000 times/day at $0.15/run on Opus costs $4,500/month. The same agent on Sonnet at $0.03/run costs $900/month. Measure quality difference before paying 5x

Multi-Agent with Handoffs

The SDK supports handing off between agents. One agent completes its work and passes context to the next.

research_agent = Agent(
    model="claude-sonnet-4-6",
    system_prompt="Research agent prompt...",
    tools=[search_company, search_linkedin],
    output_schema=AccountBrief,
)

email_agent = Agent(
    model="claude-sonnet-4-6",
    system_prompt="Email writer prompt...",
    tools=[],
    output_schema=EmailSequence,
)

# Pipeline: research → email writing
account_brief = research_agent.run(f"Research {company_name}")
email_sequence = email_agent.run(
    f"Write a 3-email cold sequence for this account:\n\n"
    f"{account_brief.model_dump_json()}"
)

Handoff rules:

  • Pass structured data between agents, not free-form text. The output schema of Agent A should match the expected input format of Agent B
  • Don't pass the entire conversation history from Agent A to Agent B. Pass only the final output. Agent B doesn't need Agent A's reasoning process
  • Each agent should be independently runnable. Test Agent B with a manually-crafted input before wiring it to Agent A. This isolates bugs to the specific agent
  • Log the output at every handoff point. When the pipeline produces bad output, you need to identify which agent failed

Guardrails

Guardrails prevent the agent from doing things it shouldn't: calling tools it doesn't have access to, producing output that violates rules, or running indefinitely.

Turn limits

agent = Agent(
    model="claude-sonnet-4-6",
    system_prompt="...",
    tools=[...],
    max_turns=10,  # Stop after 10 tool-use turns
)

Set limits based on agent type:

  • Research agents: 10-15 turns (multiple tool calls needed)
  • Email writers: 3-5 turns (generate, maybe self-critique, done)
  • Classifiers: 1-2 turns (classify and respond)
  • QA agents: 2-3 turns (check rules, report)

Content filtering

Add a post-processing step to catch rule violations the model missed.

BANNED_PHRASES = [
    "leveraging", "in today's fast-paced world", "best-in-class",
    "holistic", "synergies", "unlock", "streamline",
]

def check_email_rules(email_text: str) -> list[str]:
    """Return list of rule violations found."""
    violations = []
    for phrase in BANNED_PHRASES:
        if phrase.lower() in email_text.lower():
            violations.append(f"Banned phrase: '{phrase}'")
    if "—" in email_text:
        violations.append("Contains em-dash")
    word_count = len(email_text.split())
    if word_count > 80:
        violations.append(f"Over word limit: {word_count} words")
    return violations

Guardrail rules:

  • Programmatic checks are more reliable than prompt-based rules. The model may violate a prompt rule 5% of the time. A regex check catches it 100% of the time. Use both
  • Run guardrails on the final output, not intermediate steps. Intermediate tool calls may contain content that would violate output rules (e.g., a search result containing "best-in-class"). Only check the final customer-facing output
  • When a guardrail catches a violation, feed it back to the agent for revision. Don't just flag it. Let the agent fix it (with a retry cap of 2-3)

Cost Management

Lever How Impact
Model selection Use Haiku for classification, Sonnet for most agents 3-5x cost reduction vs Opus everywhere
Turn limits Cap max_turns per agent type Prevents runaway loops
Prompt caching Enable caching for system prompts (automatic with SDK) 90% reduction on cached prompt tokens
Structured output Use output_schema to prevent verbose responses 20-40% fewer output tokens
Tool result truncation Limit tool return size (e.g., first 1,000 chars of search results) Fewer input tokens per turn
Batch processing Use the Batch API for non-real-time tasks 50% cost reduction on batch-eligible tasks

Cost tracking rules:

  • Log token usage per agent per run from Day 1. You can't optimize what you don't measure
  • Calculate cost per unit of work: "cost per account brief", "cost per email sequence", "cost per lead scored." This makes cost legible to non-technical stakeholders
  • Set budget alerts. A runaway agent calling tools in a loop can burn through $100+ in a single session. Max_turns prevents this, but budget alerts catch edge cases
  • Compare agent cost to human cost. If an account brief costs $0.15 to generate and saves 45 minutes of SDR time, the economics are clear. If it costs $2.00 and saves 5 minutes, reconsider

Common GTM Agent Recipes

Recipe 1: Account research pipeline

Research Agent (Sonnet, 10 turns, 3 tools)
    → AccountBrief schema
    → QA Agent (Sonnet, 2 turns, no tools)
    → Validated AccountBrief

Recipe 2: Cold email generation with QA loop

Email Writer (Sonnet, 5 turns, no tools)
    → EmailSequence schema
    → Email Critic (Sonnet, 2 turns, no tools)
    → Pass? → Output
    → Fail? → Feedback → Email Writer (retry, max 3)

Recipe 3: Inbound reply router

Reply Classifier (Haiku, 1 turn, no tools)
    → Classification + confidence
    → If positive: Meeting Booker Agent (Sonnet)
    → If objection: Objection Handler Agent (Sonnet)
    → If OOO: OOO Parser Agent (Haiku)
    → If opt-out: CRM Updater (programmatic, no LLM)
    → If low confidence: → Human review queue

Recipe 4: Signal monitor (daily batch)

For each target account:
    Signal Scanner (Haiku, 3 turns, 2 tools: web search + news search)
        → SignalList schema
        → If new signals found: Alert via Slack + update CRM
        → If no signals: Skip

Anti-Pattern Check

  • Building agents without the SDK. Raw API tool-use loops are error-prone and require reimplementing conversation management, tool call threading, and retry logic. Use the SDK unless the agent has no tools
  • No max_turns. A research agent without a turn limit can loop indefinitely, burning tokens and producing increasingly irrelevant output. Always set a cap
  • Using Opus for every agent. Opus is 5x the cost of Sonnet and often produces equivalent output for non-creative tasks. Default to Sonnet, upgrade to Opus only where quality measurably improves
  • No structured output for pipeline agents. If Agent A's output feeds Agent B, free-form text between them causes parsing failures. Use output_schema
  • Tools that return raw HTML. The model will spend tokens parsing markup instead of reasoning about content. Parse in the tool, return structured data
  • No cost tracking. Token costs are invisible until the invoice arrives. Log usage per agent per run from the start
  • No guardrails on customer-facing output. Prompt rules are probabilistic. Programmatic checks are deterministic. Use both. A banned-phrase regex catches what the model misses
  • Skipping prompt caching. System prompts for agents are often 1,000-2,000 tokens. Caching them reduces cost by 90% on those tokens. The SDK enables this automatically, but verify it's working
Want agents that use skill files like this?
We customize skill files for your brand voice and methodology, then run content agents against them.
Book a call