general gtm-agent-design

gtm-agent-design

This skill should be used when the user asks to "build a GTM agent", "design a sales agent", "create an outbound agent", "build an AI agent for go-to-market", "design an agent for prospecting", "build an agent for lead scoring", "create an AI SDR", "automate GTM with agents", "design a revenue agent", or any variation of designing AI agents that perform go-to-market tasks for B2B SaaS teams.
Download .md

GTM Agent Design

A GTM agent is an AI system that performs a specific go-to-market task: researching accounts, writing outbound, scoring leads, handling replies, enriching data, or routing prospects. The agent replaces a repetitive human workflow with an LLM-powered process that runs faster, cheaper, and more consistently.

The design principle: start with the human workflow, not the technology. Map what a person does today step by step. Identify which steps are repetitive, rule-based, and don't require judgment. Automate those. Keep the judgment steps human.

The GTM Agent Landscape

Agent type What it does Replaces Human-in-the-loop?
Research Agent Pulls company and contact data from multiple sources, synthesizes into an account brief Manual account research (45-90 min per account) Review output before use
Email Writer Agent Generates cold email sequences from an account brief SDR writing emails (15-30 min per sequence) Approve before send
Personalization Agent Inserts per-prospect tokens into templated sequences SDR personalizing templates (5-10 min per email) Spot-check 10-20%
Reply Classifier Agent Classifies inbound replies (positive, objection, OOO, opt-out) SDR triaging inbox (ongoing) Review low-confidence classifications
Lead Scorer Agent Scores inbound leads on ICP fit and intent signals RevOps manual scoring or static rules Calibrate model quarterly
Enrichment Agent Fills missing data fields from multiple providers Ops team running enrichment workflows Validate match rates
Signal Monitor Agent Watches for buying signals across data sources, alerts when triggered Manual signal scanning (daily) Set alert thresholds
Routing Agent Routes leads to the right rep based on territory, segment, and availability RevOps routing rules in CRM Audit routing accuracy weekly
Meeting Prep Agent Generates pre-call briefs from CRM data, research, and prior notes AE/SDR manual prep (15-30 min per meeting) Read before the call
Follow-Up Agent Generates post-meeting follow-up emails from call notes AE writing follow-ups (10-20 min per email) Edit and approve before send

Agent Design Process

Step 1: Map the human workflow

Before writing any code or prompts, document exactly what a person does today.

Workflow mapping template:

For each step, capture:

Field What to document
Step name What the person does ("Find company funding history")
Input What they start with ("Company name")
Source Where they get the data ("Crunchbase, press articles")
Action What they do with it ("Read, extract key facts, summarize")
Output What they produce ("Funding summary: round, amount, date, investors")
Time How long it takes ("5-10 minutes")
Judgment required? Does this step require human judgment or is it rule-based?
Error rate How often do humans get this wrong?

Example: SDR account research workflow

Step Input Source Action Output Time Judgment?
1. Company snapshot Company name LinkedIn, website Read about page, note size/stage/vertical Snapshot fields 2 min No
2. Funding history Company name Crunchbase Search, extract rounds Funding summary 3 min No
3. Recent signals Company name LinkedIn, news, job boards Scan for events in last 90 days Signal list 10 min Low
4. Tech stack Company name Job postings, BuiltWith Extract tool mentions Stack list 5 min Low
5. Committee mapping Company name LinkedIn Search titles, identify roles Contact list 10 min Medium
6. Problem hypothesis All above Synthesis Connect signals to pain, write hypothesis 1 paragraph 10 min High
7. Email drafting Account brief Writing Craft 3-email sequence 3 emails 15 min High

Steps 1-4 are low-judgment, high-repetition. Automate these. Steps 5-6 require moderate judgment. Semi-automate (agent proposes, human validates). Step 7 requires high judgment on tone and quality. Agent drafts, human approves.

Step 2: Define the agent's scope

One agent, one job. Draw the boundary tight.

Scoping rules:

  • An agent should complete in under 60 seconds for real-time tasks (reply classification, routing) or under 5 minutes for batch tasks (research, email writing)
  • An agent should have a single, testable output. "Account brief" is testable. "Help with sales" is not
  • If the agent needs more than 5 tools, it's probably too broad. Split it
  • If the system prompt is over 2,000 words, it's probably covering multiple jobs. Split it
  • If you're writing "if the input is X, do this; if the input is Y, do that" in the prompt, you need a router and two specialist agents, not one agent with branching logic

Step 3: Design the system prompt

The system prompt is the agent's operating manual. It determines output quality more than any other design choice.

System prompt structure:

1. Role and purpose (2-3 sentences)
2. Input specification (what the agent receives)
3. Output specification (exact format, schema, required fields)
4. Rules and constraints (hard rules the agent must follow)
5. Examples (2-3 input/output pairs showing ideal behavior)
6. Edge cases (what to do when data is missing or ambiguous)

Prompt design rules:

  • Lead with the role. "You are a B2B account research agent that produces structured account briefs from company names" is better than a paragraph of context
  • Specify the output format exactly. If you want JSON, show the schema. If you want markdown, show the template. Ambiguous output specs produce inconsistent results
  • Hard rules are non-negotiable constraints. "Never use em-dashes. Never exceed 80 words. Never fabricate a signal." These go in a dedicated rules section, not buried in paragraphs
  • Examples are the most powerful part of the prompt. Two good examples teach the agent more than 500 words of instructions. Show the input, the ideal output, and annotate why the output is good
  • Address missing data explicitly. "If funding data is not found, output 'Funding: Not found (checked Crunchbase, PitchBook)' instead of guessing" prevents hallucination

Step 4: Select tools

Tools are the actions an agent can take: search the web, query an API, read a database, call an MCP server.

Common GTM agent tools:

Tool What it does Used by
Web search Searches the internet for company information Research Agent, Signal Monitor
LinkedIn API / scraper Pulls profile and company data from LinkedIn Research Agent, Committee Mapper
CRM read/write Reads and updates CRM records Enrichment Agent, Routing Agent, Scorer
Enrichment API (Apollo, Clearbit) Fills missing contact and company data Enrichment Agent
Email sending API (Lemlist, Outreach) Loads sequences and sends emails Email Writer (with human approval gate)
Calendar API Books meetings, checks availability Meeting Booker Agent
Slack API Sends alerts and notifications Signal Monitor, Routing Agent
File read/write Reads CSVs, writes reports Batch processing agents

Tool design rules:

  • Every tool that writes to an external system (CRM, email, Slack) should have a confirmation step in development and a human approval gate in production
  • Tools should return structured data, not raw HTML or API responses. Parse before returning to the agent
  • Limit tool count per agent. 3-5 tools is ideal. Above 7, the agent spends more time deciding which tool to use than doing the work
  • Include error information in tool responses. "Search returned 0 results for [query]" is better than an empty response. The agent needs to know when data is missing vs when the tool failed

Step 5: Define evaluation criteria

Before building, define how you'll measure whether the agent works.

Evaluation framework:

Dimension What to measure How to measure Minimum bar
Accuracy Are the facts correct? Human review of 50 outputs against ground truth 95%+ factual accuracy
Completeness Are all required fields populated? Automated schema check 90%+ field completion
Rule compliance Does output follow all hard rules? Automated rule checker (word count, banned phrases, format) 100% compliance
Quality Is the output good enough to use? Human rating (1-5 scale) on 50 outputs Average ≥ 4.0
Latency How long does it take? Timer per run Under threshold (60s real-time, 5min batch)
Cost How much does it cost per run? Token tracking Under unit economics threshold

Evaluation rules:

  • Define the minimum bar before building. "We'll know it's good enough when..." should be answerable before writing the first prompt
  • Accuracy and rule compliance are non-negotiable. Quality and latency can be traded off
  • Test on at least 50 inputs before deploying. 5 test cases is a demo, not a test
  • Measure cost per unit of output. "$0.15 per account brief" or "$0.03 per email." If the agent costs more than the human time it replaces, the economics don't work

GTM Agent Archetypes

The Research Agent

Purpose: Transform a company name into a structured account brief.

Input: Company name, domain, optional ICP criteria

Output: Structured account brief with: company snapshot, funding history, recent signals, tech stack indicators, 3-5 committee contacts, problem hypothesis

Key design decisions:

  • Use web search + LinkedIn as primary tools. Add Crunchbase API if available
  • Structure output as JSON or structured markdown. Free-form summaries are harder for downstream agents to parse
  • Include a confidence score per field. "Funding: $45M Series B (high confidence, Crunchbase)" vs "ARR: ~$15M (low confidence, estimated from headcount)"
  • Handle private/stealth companies explicitly. "Limited public information available. Brief is incomplete" is better than a hallucinated profile
  • Time-cap research at 60 seconds per account. Beyond that, diminishing returns

The Email Writer Agent

Purpose: Generate a cold email sequence from an account brief.

Input: Account brief (from Research Agent or human), target contact name/title, product value prop

Output: 3-email sequence with subject lines, bodies, and send timing

Key design decisions:

  • Embed all cold-outbound-email-writing rules directly in the system prompt. Word limits, banned phrases, signal requirements, subject line rules
  • Include 2-3 examples of ideal output in the prompt. Examples train the model better than rules alone
  • Use a QA loop: Writer → Critic → Rewrite if needed. Cap at 3 iterations
  • Separate "generate" from "personalize." The writer creates the template sequence. A personalization agent inserts per-contact tokens. This separation lets you reuse the same sequence across contacts with different personalization
  • Output should include the raw email text plus metadata: word count per email, signal used, proof point used, subject line pattern used

The Reply Classifier Agent

Purpose: Classify inbound email replies into actionable categories.

Input: Reply email text, original outbound email text, prospect metadata

Output: Classification (positive, objection, question, OOO, opt-out, irrelevant), confidence score, recommended next action

Key design decisions:

  • Use a fast, cheap model (Haiku). Classification doesn't need Opus
  • Define 6-8 categories with clear boundaries and 3+ examples per category
  • Include a confidence threshold. Below 80% confidence, route to human review
  • Output the recommended next action alongside the classification. "Positive reply. Recommended: send meeting booking email with 3 time slots" gives the downstream system or human a clear next step
  • Handle multi-intent replies. "I'm interested but I'm OOO until the 15th" is both positive and OOO. The classifier should detect both and route accordingly

The Signal Monitor Agent

Purpose: Continuously watch data sources for buying signals on target accounts.

Input: List of target accounts, signal definitions (what to watch for)

Output: Alert when a signal is detected: account name, signal type, signal details, signal strength, recommended action

Key design decisions:

  • Run on a schedule (daily or weekly), not real-time. Most buying signals don't require instant response
  • Define signal types with explicit detection criteria. "Funding round" = specific press release or Crunchbase entry, not "they seem to be growing"
  • Deduplicate signals. The same funding round shouldn't trigger 5 alerts from 5 sources
  • Include signal strength scoring. A Series B announcement is stronger than a LinkedIn post about growth plans
  • Route alerts to the right person. Signal on a Tier 1 ABM account goes to the ABM marketer. Signal on a Tier 3 account goes to the SDR queue

Production Deployment

Progressive rollout

Phase Duration Volume Human review Goal
1. Prototype Week 1-2 10 test inputs 100% Does it work at all?
2. Pilot Week 3-6 50-100 real inputs 100% Does output quality meet the bar?
3. Controlled launch Week 7-12 Full volume 50% spot-check Does quality hold at scale?
4. Production Week 12+ Full volume 10-20% spot-check Ongoing quality assurance

Rollout rules:

  • Never skip phases. A prototype that works on 10 test inputs may fail at 100 real inputs
  • Define phase advancement criteria before starting. "Advance from Pilot to Controlled Launch when accuracy ≥ 95% on 50+ human-reviewed outputs"
  • Keep a human fallback throughout. If the agent goes down or quality drops, the team can revert to the manual process immediately
  • Track unit economics from Phase 2 onward. Cost per output, time saved per output, quality vs human baseline

Monitoring in production

Metric Check frequency Alert threshold
Output quality score (from human spot-checks) Weekly Average drops below 3.5/5
Rule compliance rate Daily (automated) Any rule violation
Latency per run Per run Exceeds 2x baseline
Cost per run Daily Exceeds budget by 20%
Error rate Per run Above 5%
Human override rate Weekly Above 30% (agent outputs being rejected)

Anti-Pattern Check

  • Starting with the technology instead of the workflow. "Let's build an agent with Claude and MCP" before mapping the human process produces agents that don't fit the actual need. Map the workflow first
  • Building one agent to do everything. A "GTM Agent" that researches, writes emails, scores leads, and updates CRM will be mediocre at all four. One agent, one job
  • No human review on customer-facing output. An agent sending emails without human approval will eventually send something embarrassing to a Tier 1 account. The cost of that one bad email exceeds the cost of reviewing 1,000 good ones
  • Optimizing for speed before quality. A fast agent that produces bad output is worse than no agent. Get quality right in Pilot phase. Optimize speed in Production
  • No evaluation framework. "It seems to work" is not evaluation. Define quantitative criteria (accuracy, compliance, cost) before building and measure against them continuously
  • Skipping the prototype phase. Going straight to full-volume deployment because "the prompt looks good" leads to expensive failures. Test on 10 inputs first. Always
  • Using the most expensive model for every agent. Match model to cognitive demand. Classification = Haiku. Extraction = Sonnet. Generation = Opus or Sonnet. Using Opus for routing is burning money
  • No fallback plan. If the agent breaks at 2am, what happens? If there's no answer, you're not ready for production. Maintain the manual process as a fallback until the agent has 30+ days of stable operation
  • Treating agent design as a one-time project. Prompts drift. Data sources change. Quality degrades. Agent design is ongoing. Budget for weekly monitoring and monthly iteration
Want agents that use skill files like this?
We customize skill files for your brand voice and methodology, then run content agents against them.
Book a call