prompt-design-for-agents
Prompt Design for Agents
A system prompt is the operating manual for an agent. It determines output quality more than model selection, tool design, or orchestration architecture. A well-prompted Sonnet outperforms a poorly-prompted Opus. Invest more time here than anywhere else in agent design.
The principle: write the prompt as if you're onboarding a smart new hire who has zero context on your company, your process, or your quality bar. Be explicit about what good looks like. Show, don't just tell.
Prompt Architecture
Every agent system prompt follows the same 7-section structure. Order matters. The model pays more attention to content earlier in the prompt.
1. Identity — Who the agent is and what it does (2-3 sentences)
2. Input spec — What the agent receives and in what format
3. Output spec — Exact format, schema, required fields
4. Process — Step-by-step instructions for how to get from input to output
5. Rules — Hard constraints the agent must never violate
6. Examples — 2-3 input/output pairs showing ideal behavior
7. Edge cases — What to do when data is missing, ambiguous, or unexpected
Why this order works
- Identity first because it frames everything that follows. The model interprets all subsequent instructions through the lens of "who am I"
- Input/output specs before process because the model needs to know what it's working with and what it's producing before it reads how
- Rules after process because rules are constraints on the process. They make more sense after the model understands what it's doing
- Examples near the end because they serve as calibration. The model has absorbed the instructions and now sees what "good" looks like concretely
- Edge cases last because they're exceptions to the normal flow. The model should understand the normal flow first
Section 1: Identity
Two to three sentences. Who the agent is, what it does, and for whom.
Template:
You are a [role] that [primary action] for [audience].
Your output is used by [downstream consumer] to [downstream purpose].
[One sentence on quality bar or operating philosophy.]
Good examples:
You are a B2B account research agent that produces structured account briefs
from company names. Your output is used by SDRs and ABM marketers to craft
personalized outbound campaigns. Accuracy matters more than completeness. Never
guess. If data is not found, say so.
You are a cold email writer that generates 3-email outbound sequences for B2B
SaaS prospects. Your output is reviewed by a human before sending. Write like
a peer, not a vendor. Every email must earn the next.
Bad examples:
You are a helpful AI assistant.
(Too generic. The model has no frame for what "helpful" means in this context.)
You are an advanced AI-powered go-to-market intelligence platform that leverages
cutting-edge natural language processing to synthesize multi-source data streams
into actionable strategic insights for revenue-generating teams.
(Marketing copy, not an operating manual. The model will mirror this style in its output.)
Identity rules
- Name the specific job, not a general capability. "Account research agent" not "helpful assistant"
- Name the downstream consumer. The agent writes differently when it knows the output goes to an SDR vs a VP
- State the quality philosophy in one sentence. "Accuracy over completeness" or "Conciseness over comprehensiveness" sets the tone for all decisions the model makes
Section 2: Input Spec
Define exactly what the agent receives. Include the format, required fields, and optional fields.
Template:
## Input
You receive the following:
- **company_name** (required): The target company name
- **domain** (optional): The company's website domain
- **icp_criteria** (optional): ICP fit criteria to evaluate against
- **context** (optional): Additional context from the requesting user
Input format: JSON object or plain text, depending on source.
Input spec rules:
- Label every field as required or optional. The agent needs to know what it can always rely on vs what might be missing
- Specify the format. "JSON object with these keys" or "plain text, one company name per line." Ambiguous input specs cause parsing failures
- Include an example input. Even one example disambiguates more than a paragraph of description
- Note what the input does NOT include. "You do not receive the prospect's email address. Do not attempt to guess or construct email addresses" prevents hallucination on fields the agent doesn't have
Section 3: Output Spec
The most important section. Ambiguous output specs are the #1 cause of inconsistent agent behavior.
Template:
## Output
Return a JSON object with the following structure:
{
"company_name": "string",
"snapshot": {
"founded": "year or 'Unknown'",
"hq": "city, state/country",
"employee_count": "number or range",
"funding": "most recent round, amount, date",
"industry": "string"
},
"signals": [
{
"signal": "description",
"type": "funding | hiring | product | leadership | tech_stack",
"date": "YYYY-MM-DD or approximate",
"source": "where found",
"strength": "tier_1 | tier_2 | tier_3"
}
],
"problem_hypothesis": "One paragraph connecting signals to a specific problem",
"confidence": "high | medium | low",
"missing_fields": ["list of fields that could not be populated"]
}
Output spec rules:
- Use a concrete schema, not prose descriptions. Show the exact JSON structure or markdown template. "Return a summary of the company" produces wildly inconsistent outputs. A schema produces consistent ones
- Specify what to do for missing data. Every field should have a fallback value:
"Unknown",null,"Not found (checked [sources])". This prevents hallucination - Include
missing_fieldsorconfidencein the schema. The agent should communicate what it doesn't know, not fill gaps with guesses - Specify length constraints per field. "problem_hypothesis: 2-4 sentences, under 100 words" prevents both terse and bloated outputs
- Show one complete example output. The model calibrates its output format to the example more reliably than to the schema description alone
Section 4: Process
Step-by-step instructions for transforming input into output. Think of this as the agent's standard operating procedure.
Template:
## Process
Follow these steps in order:
1. **Search for company information.** Use the web_search tool with the query
"[company_name] funding crunchbase". Extract founding year, HQ, employee
count, and most recent funding round.
2. **Identify recent signals.** Search for "[company_name] news" and
"[company_name] hiring". Look for events in the last 90 days: funding
announcements, leadership changes, product launches, job postings for
roles relevant to [product category].
3. **Assess tech stack.** Search for "[company_name] jobs" and look for
tools mentioned in job requirements. Cross-reference with BuiltWith
if the domain is provided.
4. **Formulate problem hypothesis.** Connect the strongest signal to a
specific problem the company likely faces. Ground the hypothesis in
evidence from steps 1-3. Do not speculate beyond what the data supports.
5. **Compile output.** Assemble all findings into the output schema.
Populate missing_fields with any fields that could not be found.
Set confidence based on data coverage.
Process rules:
- Number every step. The model follows numbered sequences more reliably than prose paragraphs
- Name the specific tool to use in each step. "Use the web_search tool with query X" is better than "search for information about the company"
- Include the search queries. Specifying exact queries ("[ company_name] funding crunchbase") produces more consistent results than "search for funding data"
- Tell the agent what to extract, not just where to look. "Extract founding year, HQ, employee count, and most recent funding round" is actionable. "Look at the company profile" is not
- Keep it to 5-8 steps. Fewer than 5 usually means steps are too vague. More than 8 usually means the agent's scope is too broad. Split into multiple agents
Section 5: Rules
Hard constraints. Non-negotiable. The agent must follow these regardless of input, context, or how "natural" a violation might feel.
Template:
## Rules
Follow these rules without exception:
### Accuracy rules
- Never fabricate information. If data is not found, report it as missing.
Do not infer, guess, or synthesize from insufficient evidence.
- Never present estimates as facts. Label every estimate: "ARR: ~$15M
(estimated from headcount, low confidence)"
- Cite sources for every claim. "Funding: $45M Series B (Crunchbase, Oct 2025)"
### Format rules
- Every email must be under 80 words (Email 1), 90 words (Email 2),
or 30 words (Email 3)
- Subject lines: ≤ 5 words, lowercase, no emoji
- No em-dashes (—) in any output. Use periods or restructure
### Content rules
- Never use these phrases: "leveraging", "in today's fast-paced world",
"best-in-class", "holistic", "synergies", "unlock", "streamline"
- Never start an email with "I". Start with the signal or the prospect
- Never use "demo" — use "teardown", "walkthrough", or "quick look"
### Behavioral rules
- If a required tool returns an error, note the error and continue
with available data. Do not retry more than once
- If confidence is "low" on any critical field, flag it in the output.
Do not bury low-confidence data in otherwise confident-looking output
Rules formatting principles:
- Group rules by category. Accuracy rules, format rules, content rules, behavioral rules. Grouping makes them scannable and reduces missed rules
- Use "never" and "always" for absolute constraints. "Never fabricate" is clearer than "try to avoid fabricating"
- Pair each rule with the specific behavior. "No em-dashes" is a rule. "No em-dashes (—) in any output. Use periods or restructure" is a rule the model can follow
- Keep rules to 10-15 max. Beyond that, the model starts dropping rules. If you have 25 rules, some of them are redundant or should be in a reference file
- Put the most important rules first. The model is more likely to follow rules that appear earlier in the list
Section 6: Examples
Two to three input/output pairs showing exactly what ideal behavior looks like. Examples are the most powerful calibration tool available.
Example design rules:
- Show complete input and complete output. Partial examples create ambiguity
- Choose examples that demonstrate different scenarios. One straightforward case, one case with missing data, one edge case
- Annotate what makes the example good. After the output, add a brief note: "Note: this output correctly handles the missing funding data by reporting 'Not found' instead of guessing"
- Use realistic data. Fake data that's obviously fake ("Acme Corp, founded 2020, 50 employees") trains the model differently than realistic data. Use anonymized real examples when possible
- Match the output exactly to the output spec. If the spec says JSON, the example should be JSON. If the spec says markdown with headers, the example should be markdown with headers
Example count:
- 2 examples minimum. One example shows format. Two examples show range
- 3 examples ideal for complex agents. Straightforward case, partial data case, edge case
- More than 4 is diminishing returns and consumes context. If you need 5+ examples, the output spec is probably underspecified
Section 7: Edge Cases
Explicit instructions for scenarios that fall outside the normal process. The model handles edge cases well when told what to do. It handles them poorly when left to improvise.
Common edge cases for GTM agents:
| Edge case | What to do |
|---|---|
| Company not found (no search results) | Return output with all fields set to "Not found." Set confidence to "low." Do not construct a profile from partial data |
| Company is pre-revenue / stealth | Note "Pre-revenue / stealth mode" in snapshot. Signals and tech stack will be sparse. Set confidence accordingly |
| Multiple companies with the same name | Use the domain to disambiguate. If no domain provided, note the ambiguity and pick the most likely match based on ICP criteria. Flag in output |
| Company was recently acquired | Note the acquisition. Research the parent company if the original company no longer operates independently |
| Contact has left the company | Note "No longer at [company] as of [date]." Do not include in the committee map |
| Signal is ambiguous (could be positive or negative) | Report the signal with the ambiguity noted. Do not force-classify as positive or negative. Let the human reviewer interpret |
| Input is not a company name | Return an error: "Input does not appear to be a company name. Received: [input]." Do not attempt to process |
Edge case rules:
- List 5-8 edge cases. Cover the scenarios that would cause the agent to produce bad output if not explicitly handled
- For each edge case, give a specific instruction. "Handle gracefully" is not an instruction. "Return output with all fields set to 'Not found'" is
- Include the "I don't know" case. Every agent should have explicit permission and instructions to say "I couldn't find this" rather than fabricating
Prompt Anti-Patterns
1. The essay prompt
A 3,000-word prose prompt with no structure, no headers, no numbered steps. The model loses track of instructions buried in paragraphs.
Fix: Use the 7-section architecture. Headers, numbered lists, tables. Structure makes prompts scannable for the model just like it does for humans.
2. The vague output spec
"Return a helpful summary of the company." What format? How long? What fields? What's "helpful"?
Fix: Exact schema with field types, length constraints, and a complete example output.
3. Rules buried in process steps
"In step 3, make sure you don't use em-dashes and also keep it under 80 words and don't mention the competitor by name." Rules mixed into process instructions get missed.
Fix: Dedicated Rules section. All constraints in one scannable place.
4. No examples
Instructions without examples leave the model to interpret quality on its own. Its interpretation rarely matches yours.
Fix: Two to three examples minimum. Complete input-output pairs with annotations.
5. Contradictory instructions
"Be concise" in the identity section and "provide comprehensive detail" in the output spec. The model tries to satisfy both and fails at both.
Fix: Read the prompt end-to-end and check for contradictions. When two instructions conflict, delete one.
6. Persona bloat
"You are a world-class expert in B2B SaaS go-to-market strategy with deep expertise in..." This doesn't improve output. It wastes tokens and primes the model for verbose, self-important responses.
Fix: Two-sentence identity. Role + purpose + quality bar. No superlatives.
7. Over-constraining with soft rules
"Try to keep the output concise." "Consider mentioning the competitor if relevant." "You might want to include a proof point." Soft rules are effectively suggestions. The model follows them inconsistently.
Fix: Make every rule binary. Either it's a hard constraint ("under 80 words") or remove it. Soft rules create inconsistent output.
Prompt Iteration Process
Diagnose before changing
When agent output is bad, diagnose which section failed before editing.
| Symptom | Likely cause | Section to fix |
|---|---|---|
| Output format is wrong | Output spec is ambiguous | Section 3: Output spec |
| Agent skips steps | Process is unclear or too long | Section 4: Process |
| Agent violates a rule | Rule is buried or contradicted | Section 5: Rules |
| Output tone is off | Identity or examples set wrong tone | Section 1: Identity, Section 6: Examples |
| Agent hallucinates data | No explicit "don't guess" rule + no edge case handling | Section 5: Rules, Section 7: Edge cases |
| Output is inconsistent across runs | No examples or examples are too similar | Section 6: Examples |
Change one thing at a time
- Make one edit per iteration. If output is too long and factually inaccurate, fix length first, re-test, then fix accuracy
- Track every prompt version. Save each version with a timestamp and note what changed and why
- Re-run the same test inputs after each change. Compare outputs side-by-side to measure improvement
- After 3 iterations on the same section without improvement, the problem may be in a different section. Re-diagnose
Prompt review checklist
Before deploying any agent prompt:
- [ ] Identity is 2-3 sentences. Names the role, action, audience, and quality bar
- [ ] Input spec lists every field with required/optional labels and format
- [ ] Output spec includes exact schema with field types and length constraints
- [ ] Output spec includes instructions for missing data (no guessing)
- [ ] Process has 5-8 numbered steps with specific tools and queries named
- [ ] Rules are grouped by category, use "never"/"always", and total ≤ 15
- [ ] 2-3 complete input/output examples are included
- [ ] Edge cases cover missing data, ambiguous input, and the "I don't know" case
- [ ] No contradictions between sections
- [ ] No soft rules ("try to", "consider", "you might want to")
- [ ] No persona bloat or marketing language in the identity
- [ ] Total prompt length is under 2,500 words (move reference material to separate files)
Anti-Pattern Check
- Prompt is over 3,000 words with no reference files. Move detailed reference material (banned phrase lists, scoring rubrics, example databases) into separate files the agent loads as needed. The system prompt should be the operating manual, not the encyclopedia
- No examples in the prompt. Examples are the strongest calibration tool. Two examples outperform 500 words of instructions. Always include them
- Output spec says "return a summary." Summaries are subjective and inconsistent. Define the exact schema, fields, and constraints
- Rules say "try to avoid." Make it binary. "Never" or remove the rule. Soft constraints produce inconsistent output
- Process section has 12 steps. The agent's scope is too broad. Split into 2-3 agents with 4-6 steps each
- Changed 5 things in the prompt at once. Now output is different but you don't know which change helped or hurt. One edit per iteration
- Same prompt used across models. Different models respond differently to the same prompt. When switching from Sonnet to Opus or Haiku, re-test and adjust. Especially the rules section, which smaller models follow less reliably