Home/ Skills/ tool-design-for-agents

general tool-design-for-agents

tool-design-for-agents

This skill should be used when the user asks to "design tools for an AI agent", "build agent tools", "create function calling tools", "design tool schemas for LLMs", "build tools for Claude", "design agent capabilities", "write tool definitions", "create agent tool interfaces", "design function schemas for agents", or any variation of designing and building tools that AI agents call in B2B SaaS GTM workflows.

Download .md

Tool Design for Agents

A tool is a function an AI agent can call to interact with the outside world. Search a CRM, enrich a contact, send an email, query a database. The agent reads the tool's name, description, and parameters, then decides when to call it and what arguments to pass. Good tool design means the agent calls the right tool with the right arguments on the first try. Bad tool design means the agent calls the wrong tool, passes garbage arguments, or ignores the tool entirely.

The principle: design tools from the agent's perspective. The agent sees a name, a one-sentence description, and a parameter list. If those three things aren't crystal clear, the agent will guess. Agents that guess produce inconsistent results.

Tool Anatomy

The three elements an agent sees

Element	What the agent uses it for	Design goal
Name	Deciding whether this tool might be relevant	Verb-noun. Unambiguous. Instantly clear what it does
Description	Deciding whether to call this tool vs another	One sentence. States what the tool does AND what it returns
Parameters	Filling in the arguments	Clear types, clear descriptions, minimal required fields

Name rules

Verb-noun format. search_contacts, enrich_company, create_deal, send_email. The verb is the action. The noun is the object
No ambiguous verbs. handle_contact means nothing. process_data means nothing. Use specific verbs: search, create, update, delete, enrich, validate, send, get, list
No overlapping names. If the agent sees search_contacts, find_contacts, and lookup_contacts, it can't distinguish them. Pick one name per action. Delete the synonyms
Snake_case. search_contacts not searchContacts or SearchContacts. Consistency across all tools

Good names vs bad names:

Good	Bad	Why bad is bad
`search_hubspot_contacts`	`contact_tool`	No verb. Could be read, write, delete, anything
`enrich_company`	`company_enrichment`	Noun phrase, not action. Agent may not recognize it as callable
`create_hubspot_deal`	`deal_handler`	"Handler" is vague. Create, update, delete?
`validate_email_address`	`email_check`	"Check" could mean validate, look up, or send a test
`get_linkedin_profile`	`linkedin`	No verb, no noun. Complete mystery

Description rules

One sentence. State what the tool does and what it returns. "Searches HubSpot contacts by company name or job title. Returns up to 10 matching contact records with name, title, email, and company."
Include the return value. The agent needs to know what it gets back to plan its next step. "Creates a deal in HubSpot" is incomplete. "Creates a deal in HubSpot. Returns the deal ID and creation timestamp" tells the agent it can use the deal ID downstream
State limitations. "Returns up to 10 results" or "Only works for US companies" or "Requires a valid email domain." Constraints prevent the agent from expecting behavior the tool doesn't support
No marketing language. "Powerful contact enrichment engine that leverages AI to provide deep insights." The agent doesn't care. "Enriches a contact record with company data, title, and LinkedIn URL from Apollo." That's useful

Parameter rules

Rule	Why
Required parameters are truly required	If the tool works without a parameter, make it optional. Agents hallucinate values to satisfy unnecessary required fields
Each parameter has a description	"query: string" tells the agent nothing. "query: string. The company name or domain to search for" tells the agent exactly what to pass
Use specific types	`string` is loose. `string, enum: ["positive", "negative", "neutral"]` is tight. The tighter the type, the fewer bad calls
Default values for optional parameters	`limit: integer, default 10`. The agent doesn't need to specify common defaults
Maximum 5-7 parameters	More than 7 parameters and the agent struggles. If a tool needs 12 parameters, it's doing too many things. Split it

Tool Categories for GTM

Standard GTM tool set

Category	Tools	What they do
CRM read	`search_contacts`, `get_contact`, `get_company`, `get_deal`, `list_activities`	Read data from CRM without modification
CRM write	`create_contact`, `update_contact`, `create_deal`, `log_activity`	Modify CRM data. Always requires human approval gate
Enrichment	`enrich_company`, `enrich_contact`, `find_email`, `verify_email`	Pull data from enrichment providers
Research	`search_web`, `get_linkedin_profile`, `get_company_news`, `get_job_postings`	Gather external data for research
Email	`draft_email`, `send_email`, `schedule_email`, `get_email_status`	Email composition and sending
Internal	`format_output`, `count_words`, `validate_rules`, `log_result`	Agent-internal helpers

Tool category rules

CRM write tools always gate on human approval. The agent proposes a CRM update. A human approves it. The tool executes it. Never auto-execute CRM writes. One bad batch update cascades through workflows, scoring, and reporting
Enrichment tools return structured data. enrich_company returns { name, domain, employee_count, industry, funding_stage, funding_amount }. Not a paragraph of text. Structured data is easier for the agent to use correctly
Research tools set limits. search_web returns top 5 results with title, URL, and snippet. Not the full page content of 50 results. The agent's context window is finite
Internal tools don't call external services. count_words and format_output are pure functions. No API calls, no side effects. These run instantly and never fail

Designing Tool Responses

Response structure

Every tool response should follow this pattern:

{
  "success": true,
  "data": { ... },
  "metadata": {
    "source": "hubspot",
    "timestamp": "2025-01-15T10:30:00Z",
    "result_count": 3,
    "truncated": false
  }
}

For errors:

{
  "success": false,
  "error": {
    "code": "NOT_FOUND",
    "message": "No contact found with email domain 'example.com'",
    "suggestion": "Try searching by company name instead"
  }
}

Response rules

Always return structured data. JSON with named fields. Never raw text, HTML, or unprocessed API responses. The agent needs to extract specific values from the response. Named fields make this deterministic
Include a success/error flag. The agent needs to know whether the tool call worked before planning its next step. A raw response that might be data or might be an error message forces the agent to guess
Limit response size. If a tool can return 500 contacts, default to 10 and let the agent request more. Large responses burn context window and degrade agent performance
Include metadata. Source, timestamp, result count, whether results were truncated. The agent can use this to decide whether to make another call
Error suggestions help the agent recover. "No results found" is useless. "No results found. Try searching by company name instead of domain" gives the agent a recovery path

Tool Composition

How agents chain tool calls

Agent task: "Research Acme Corp and find the VP of Sales"

Step 1: search_hubspot_contacts(company_name="Acme Corp")
  → Returns 8 contacts

Step 2: Agent examines results. No VP of Sales found.

Step 3: enrich_company(domain="acme.com")
  → Returns company data including employee count

Step 4: get_linkedin_profile(company="Acme Corp", title="VP Sales")
  → Returns LinkedIn profile with name, title, current role

Step 5: create_contact(name="Jane Smith", title="VP Sales",
  company="Acme Corp", source="linkedin")
  → Returns new contact ID

Composition rules

Each tool does one thing. search_and_enrich_contact is two tools jammed together. What if the search succeeds but enrichment fails? Split them. The agent chains them when it needs both
Tool outputs are tool inputs. The contact ID from create_contact feeds into log_activity(contact_id=...). Design return values to be usable as inputs to other tools
No side effects in read tools. search_contacts should never create a log entry, trigger a webhook, or update a record. Read tools read. Write tools write. Mixing side effects makes behavior unpredictable
Idempotent where possible. Calling update_contact(id, title="VP Sales") twice should produce the same result. Idempotent tools are safe to retry on failure

Tool Access Control

Which agents can call which tools

Agent type	Read tools	Write tools	Enrichment tools	Send tools
Research agent	All CRM read	None	All enrichment	None
Scoring agent	Contact + company read	None	ICP fit tools	None
Email writer agent	Contact read (for context)	None	None	Draft only (no send)
Orchestrator agent	All read	CRM write (with approval)	All	Send (with approval)

Access rules

Principle of least privilege. Give each agent only the tools it needs. A research agent has no business sending emails. An email writer has no business updating CRM records
Write tools require approval gates. Any tool that modifies external state (CRM, email, database) must have a human approval step or an explicit automation rule
Separate draft from send. The email agent calls draft_email. A separate approval step calls send_email. The agent never sends directly
Log every tool call. Agent ID, tool name, parameters, response, timestamp. This is your audit trail when something goes wrong

Testing Tools

What to test

Test type	What it validates	How
Schema compliance	Does the tool accept valid parameters and reject invalid ones?	Pass valid and invalid parameter combinations
Response format	Does the tool return the documented response structure?	Check success responses and error responses
Agent usability	Does the agent call the tool correctly based on name + description alone?	Give the agent a task that requires the tool. Does it pick the right tool and pass correct args?
Error handling	Does the tool return useful errors? Does the agent recover?	Force errors (invalid IDs, network failures). Check agent behavior
Edge cases	Does the tool handle empty results, special characters, rate limits?	Pass empty queries, unicode, rapid sequential calls

Testing rules

Test with the agent, not just in isolation. A tool that works perfectly when called directly but confuses the agent is a bad tool. The ultimate test is: does the agent use it correctly?
Test tool selection. Give the agent 5 tools and a task. Does it pick the right one? If it picks the wrong tool, the name or description needs improvement
Test parameter filling. Does the agent pass the right arguments? If it passes company_name where domain was expected, the parameter descriptions are unclear
Test error recovery. Tool returns an error. Does the agent try again? Try a different approach? Give up gracefully? Or hallucinate a result?

Pre-Build Checklist

Before building a tool for an agent:

[ ] Tool has a verb-noun name (e.g., search_contacts, not contact_tool)
[ ] Description is one sentence stating what it does AND what it returns
[ ] Parameters have clear descriptions and types
[ ] Required parameters are truly required (tool fails without them)
[ ] Optional parameters have defaults
[ ] No more than 7 parameters
[ ] Response is structured JSON with success/error flag
[ ] Error responses include actionable suggestions
[ ] Response size is bounded (pagination or limits)
[ ] Read tools have no side effects
[ ] Write tools have approval gates
[ ] Tool tested with the actual agent (not just in isolation)
[ ] Tool name doesn't overlap with other tool names

Anti-Pattern Check

Tool named handle_data or process_input. The agent has no idea what this tool does. Use specific verb-noun names: search_contacts, enrich_company, validate_email. Every tool name should be unambiguous
All parameters marked as required. The agent hallucmates a "company_size" value because the tool requires it but the agent doesn't have it. Only require parameters the tool truly can't function without
Tool returns raw API response. The agent gets a 200-line JSON blob from HubSpot's API. 190 lines are irrelevant. Process the response in the tool. Return only the fields the agent needs
No error handling. Tool throws an exception. Agent receives a stack trace. It tries to extract useful information from the error message and fails. Return structured error objects with codes and suggestions
One tool does five things. manage_contacts(action="search|create|update|delete"). The agent struggles to use multi-action tools correctly. One tool, one action. Split it
Tool sends emails without approval. The email agent calls send_email directly. No human review. One hallucinated claim reaches the prospect. Separate draft_email from send_email. Require approval between them
20 tools on one agent. The agent has too many options. It calls the wrong tool 30% of the time. Keep it to 5-8 tools per agent. If more are needed, split into multiple agents with focused tool sets
No tool call logging. Something went wrong but you can't tell which tool call caused it. Log every call with agent ID, tool name, parameters, response, and timestamp

Want agents that use skill files like this?

We customize skill files for your brand voice and methodology, then run content agents against them.

Book a call

# Tool Design for Agents

## Tool Anatomy

### The three elements an agent sees

| Element | What the agent uses it for | Design goal |
|---------|---------------------------|-------------|
| Name | Deciding whether this tool might be relevant | Verb-noun. Unambiguous. Instantly clear what it does |
| Description | Deciding whether to call this tool vs another | One sentence. States what the tool does AND what it returns |
| Parameters | Filling in the arguments | Clear types, clear descriptions, minimal required fields |

### Name rules

- **Verb-noun format.** `search_contacts`, `enrich_company`, `create_deal`, `send_email`. The verb is the action. The noun is the object
- **No ambiguous verbs.** `handle_contact` means nothing. `process_data` means nothing. Use specific verbs: search, create, update, delete, enrich, validate, send, get, list
- **No overlapping names.** If the agent sees `search_contacts`, `find_contacts`, and `lookup_contacts`, it can't distinguish them. Pick one name per action. Delete the synonyms
- **Snake_case.** `search_contacts` not `searchContacts` or `SearchContacts`. Consistency across all tools

**Good names vs bad names:**

| Good | Bad | Why bad is bad |
|------|-----|---------------|
| `search_hubspot_contacts` | `contact_tool` | No verb. Could be read, write, delete, anything |
| `enrich_company` | `company_enrichment` | Noun phrase, not action. Agent may not recognize it as callable |
| `create_hubspot_deal` | `deal_handler` | "Handler" is vague. Create, update, delete? |
| `validate_email_address` | `email_check` | "Check" could mean validate, look up, or send a test |
| `get_linkedin_profile` | `linkedin` | No verb, no noun. Complete mystery |

### Description rules

- **One sentence.** State what the tool does and what it returns. "Searches HubSpot contacts by company name or job title. Returns up to 10 matching contact records with name, title, email, and company."
- **Include the return value.** The agent needs to know what it gets back to plan its next step. "Creates a deal in HubSpot" is incomplete. "Creates a deal in HubSpot. Returns the deal ID and creation timestamp" tells the agent it can use the deal ID downstream
- **State limitations.** "Returns up to 10 results" or "Only works for US companies" or "Requires a valid email domain." Constraints prevent the agent from expecting behavior the tool doesn't support
- **No marketing language.** "Powerful contact enrichment engine that leverages AI to provide deep insights." The agent doesn't care. "Enriches a contact record with company data, title, and LinkedIn URL from Apollo." That's useful

### Parameter rules

| Rule | Why |
|------|-----|
| Required parameters are truly required | If the tool works without a parameter, make it optional. Agents hallucinate values to satisfy unnecessary required fields |
| Each parameter has a description | "query: string" tells the agent nothing. "query: string. The company name or domain to search for" tells the agent exactly what to pass |
| Use specific types | `string` is loose. `string, enum: ["positive", "negative", "neutral"]` is tight. The tighter the type, the fewer bad calls |
| Default values for optional parameters | `limit: integer, default 10`. The agent doesn't need to specify common defaults |
| Maximum 5-7 parameters | More than 7 parameters and the agent struggles. If a tool needs 12 parameters, it's doing too many things. Split it |

---

## Tool Categories for GTM

### Standard GTM tool set

| Category | Tools | What they do |
|----------|-------|-------------|
| CRM read | `search_contacts`, `get_contact`, `get_company`, `get_deal`, `list_activities` | Read data from CRM without modification |
| CRM write | `create_contact`, `update_contact`, `create_deal`, `log_activity` | Modify CRM data. Always requires human approval gate |
| Enrichment | `enrich_company`, `enrich_contact`, `find_email`, `verify_email` | Pull data from enrichment providers |
| Research | `search_web`, `get_linkedin_profile`, `get_company_news`, `get_job_postings` | Gather external data for research |
| Email | `draft_email`, `send_email`, `schedule_email`, `get_email_status` | Email composition and sending |
| Internal | `format_output`, `count_words`, `validate_rules`, `log_result` | Agent-internal helpers |

### Tool category rules

- **CRM write tools always gate on human approval.** The agent proposes a CRM update. A human approves it. The tool executes it. Never auto-execute CRM writes. One bad batch update cascades through workflows, scoring, and reporting
- **Enrichment tools return structured data.** `enrich_company` returns `{ name, domain, employee_count, industry, funding_stage, funding_amount }`. Not a paragraph of text. Structured data is easier for the agent to use correctly
- **Research tools set limits.** `search_web` returns top 5 results with title, URL, and snippet. Not the full page content of 50 results. The agent's context window is finite
- **Internal tools don't call external services.** `count_words` and `format_output` are pure functions. No API calls, no side effects. These run instantly and never fail

---

## Designing Tool Responses

### Response structure

Every tool response should follow this pattern:

```json
{
  "success": true,
  "data": { ... },
  "metadata": {
    "source": "hubspot",
    "timestamp": "2025-01-15T10:30:00Z",
    "result_count": 3,
    "truncated": false
  }
}
```

For errors:

```json
{
  "success": false,
  "error": {
    "code": "NOT_FOUND",
    "message": "No contact found with email domain 'example.com'",
    "suggestion": "Try searching by company name instead"
  }
}
```

### Response rules

- **Always return structured data.** JSON with named fields. Never raw text, HTML, or unprocessed API responses. The agent needs to extract specific values from the response. Named fields make this deterministic
- **Include a success/error flag.** The agent needs to know whether the tool call worked before planning its next step. A raw response that might be data or might be an error message forces the agent to guess
- **Limit response size.** If a tool can return 500 contacts, default to 10 and let the agent request more. Large responses burn context window and degrade agent performance
- **Include metadata.** Source, timestamp, result count, whether results were truncated. The agent can use this to decide whether to make another call
- **Error suggestions help the agent recover.** "No results found" is useless. "No results found. Try searching by company name instead of domain" gives the agent a recovery path

---

## Tool Composition

### How agents chain tool calls

```
Agent task: "Research Acme Corp and find the VP of Sales"

Step 1: search_hubspot_contacts(company_name="Acme Corp")
  → Returns 8 contacts

Step 2: Agent examines results. No VP of Sales found.

Step 3: enrich_company(domain="acme.com")
  → Returns company data including employee count

Step 4: get_linkedin_profile(company="Acme Corp", title="VP Sales")
  → Returns LinkedIn profile with name, title, current role

Step 5: create_contact(name="Jane Smith", title="VP Sales",
  company="Acme Corp", source="linkedin")
  → Returns new contact ID
```

### Composition rules

- **Each tool does one thing.** `search_and_enrich_contact` is two tools jammed together. What if the search succeeds but enrichment fails? Split them. The agent chains them when it needs both
- **Tool outputs are tool inputs.** The contact ID from `create_contact` feeds into `log_activity(contact_id=...)`. Design return values to be usable as inputs to other tools
- **No side effects in read tools.** `search_contacts` should never create a log entry, trigger a webhook, or update a record. Read tools read. Write tools write. Mixing side effects makes behavior unpredictable
- **Idempotent where possible.** Calling `update_contact(id, title="VP Sales")` twice should produce the same result. Idempotent tools are safe to retry on failure

---

## Tool Access Control

### Which agents can call which tools

| Agent type | Read tools | Write tools | Enrichment tools | Send tools |
|-----------|-----------|------------|-----------------|-----------|
| Research agent | All CRM read | None | All enrichment | None |
| Scoring agent | Contact + company read | None | ICP fit tools | None |
| Email writer agent | Contact read (for context) | None | None | Draft only (no send) |
| Orchestrator agent | All read | CRM write (with approval) | All | Send (with approval) |

### Access rules

- **Principle of least privilege.** Give each agent only the tools it needs. A research agent has no business sending emails. An email writer has no business updating CRM records
- **Write tools require approval gates.** Any tool that modifies external state (CRM, email, database) must have a human approval step or an explicit automation rule
- **Separate draft from send.** The email agent calls `draft_email`. A separate approval step calls `send_email`. The agent never sends directly
- **Log every tool call.** Agent ID, tool name, parameters, response, timestamp. This is your audit trail when something goes wrong

---

## Testing Tools

### What to test

| Test type | What it validates | How |
|-----------|------------------|-----|
| Schema compliance | Does the tool accept valid parameters and reject invalid ones? | Pass valid and invalid parameter combinations |
| Response format | Does the tool return the documented response structure? | Check success responses and error responses |
| Agent usability | Does the agent call the tool correctly based on name + description alone? | Give the agent a task that requires the tool. Does it pick the right tool and pass correct args? |
| Error handling | Does the tool return useful errors? Does the agent recover? | Force errors (invalid IDs, network failures). Check agent behavior |
| Edge cases | Does the tool handle empty results, special characters, rate limits? | Pass empty queries, unicode, rapid sequential calls |

### Testing rules

- **Test with the agent, not just in isolation.** A tool that works perfectly when called directly but confuses the agent is a bad tool. The ultimate test is: does the agent use it correctly?
- **Test tool selection.** Give the agent 5 tools and a task. Does it pick the right one? If it picks the wrong tool, the name or description needs improvement
- **Test parameter filling.** Does the agent pass the right arguments? If it passes `company_name` where `domain` was expected, the parameter descriptions are unclear
- **Test error recovery.** Tool returns an error. Does the agent try again? Try a different approach? Give up gracefully? Or hallucinate a result?

---

## Pre-Build Checklist

Before building a tool for an agent:

- [ ] Tool has a verb-noun name (e.g., `search_contacts`, not `contact_tool`)
- [ ] Description is one sentence stating what it does AND what it returns
- [ ] Parameters have clear descriptions and types
- [ ] Required parameters are truly required (tool fails without them)
- [ ] Optional parameters have defaults
- [ ] No more than 7 parameters
- [ ] Response is structured JSON with success/error flag
- [ ] Error responses include actionable suggestions
- [ ] Response size is bounded (pagination or limits)
- [ ] Read tools have no side effects
- [ ] Write tools have approval gates
- [ ] Tool tested with the actual agent (not just in isolation)
- [ ] Tool name doesn't overlap with other tool names

---

## Anti-Pattern Check

- Tool named `handle_data` or `process_input`. The agent has no idea what this tool does. Use specific verb-noun names: `search_contacts`, `enrich_company`, `validate_email`. Every tool name should be unambiguous
- All parameters marked as required. The agent hallucmates a "company_size" value because the tool requires it but the agent doesn't have it. Only require parameters the tool truly can't function without
- Tool returns raw API response. The agent gets a 200-line JSON blob from HubSpot's API. 190 lines are irrelevant. Process the response in the tool. Return only the fields the agent needs
- No error handling. Tool throws an exception. Agent receives a stack trace. It tries to extract useful information from the error message and fails. Return structured error objects with codes and suggestions
- One tool does five things. `manage_contacts(action="search|create|update|delete")`. The agent struggles to use multi-action tools correctly. One tool, one action. Split it
- Tool sends emails without approval. The email agent calls `send_email` directly. No human review. One hallucinated claim reaches the prospect. Separate `draft_email` from `send_email`. Require approval between them
- 20 tools on one agent. The agent has too many options. It calls the wrong tool 30% of the time. Keep it to 5-8 tools per agent. If more are needed, split into multiple agents with focused tool sets
- No tool call logging. Something went wrong but you can't tell which tool call caused it. Log every call with agent ID, tool name, parameters, response, and timestamp