tool-design-for-agents
Tool Design for Agents
A tool is a function an AI agent can call to interact with the outside world. Search a CRM, enrich a contact, send an email, query a database. The agent reads the tool's name, description, and parameters, then decides when to call it and what arguments to pass. Good tool design means the agent calls the right tool with the right arguments on the first try. Bad tool design means the agent calls the wrong tool, passes garbage arguments, or ignores the tool entirely.
The principle: design tools from the agent's perspective. The agent sees a name, a one-sentence description, and a parameter list. If those three things aren't crystal clear, the agent will guess. Agents that guess produce inconsistent results.
Tool Anatomy
The three elements an agent sees
| Element | What the agent uses it for | Design goal |
|---|---|---|
| Name | Deciding whether this tool might be relevant | Verb-noun. Unambiguous. Instantly clear what it does |
| Description | Deciding whether to call this tool vs another | One sentence. States what the tool does AND what it returns |
| Parameters | Filling in the arguments | Clear types, clear descriptions, minimal required fields |
Name rules
- Verb-noun format.
search_contacts,enrich_company,create_deal,send_email. The verb is the action. The noun is the object - No ambiguous verbs.
handle_contactmeans nothing.process_datameans nothing. Use specific verbs: search, create, update, delete, enrich, validate, send, get, list - No overlapping names. If the agent sees
search_contacts,find_contacts, andlookup_contacts, it can't distinguish them. Pick one name per action. Delete the synonyms - Snake_case.
search_contactsnotsearchContactsorSearchContacts. Consistency across all tools
Good names vs bad names:
| Good | Bad | Why bad is bad |
|---|---|---|
search_hubspot_contacts |
contact_tool |
No verb. Could be read, write, delete, anything |
enrich_company |
company_enrichment |
Noun phrase, not action. Agent may not recognize it as callable |
create_hubspot_deal |
deal_handler |
"Handler" is vague. Create, update, delete? |
validate_email_address |
email_check |
"Check" could mean validate, look up, or send a test |
get_linkedin_profile |
linkedin |
No verb, no noun. Complete mystery |
Description rules
- One sentence. State what the tool does and what it returns. "Searches HubSpot contacts by company name or job title. Returns up to 10 matching contact records with name, title, email, and company."
- Include the return value. The agent needs to know what it gets back to plan its next step. "Creates a deal in HubSpot" is incomplete. "Creates a deal in HubSpot. Returns the deal ID and creation timestamp" tells the agent it can use the deal ID downstream
- State limitations. "Returns up to 10 results" or "Only works for US companies" or "Requires a valid email domain." Constraints prevent the agent from expecting behavior the tool doesn't support
- No marketing language. "Powerful contact enrichment engine that leverages AI to provide deep insights." The agent doesn't care. "Enriches a contact record with company data, title, and LinkedIn URL from Apollo." That's useful
Parameter rules
| Rule | Why |
|---|---|
| Required parameters are truly required | If the tool works without a parameter, make it optional. Agents hallucinate values to satisfy unnecessary required fields |
| Each parameter has a description | "query: string" tells the agent nothing. "query: string. The company name or domain to search for" tells the agent exactly what to pass |
| Use specific types | string is loose. string, enum: ["positive", "negative", "neutral"] is tight. The tighter the type, the fewer bad calls |
| Default values for optional parameters | limit: integer, default 10. The agent doesn't need to specify common defaults |
| Maximum 5-7 parameters | More than 7 parameters and the agent struggles. If a tool needs 12 parameters, it's doing too many things. Split it |
Tool Categories for GTM
Standard GTM tool set
| Category | Tools | What they do |
|---|---|---|
| CRM read | search_contacts, get_contact, get_company, get_deal, list_activities |
Read data from CRM without modification |
| CRM write | create_contact, update_contact, create_deal, log_activity |
Modify CRM data. Always requires human approval gate |
| Enrichment | enrich_company, enrich_contact, find_email, verify_email |
Pull data from enrichment providers |
| Research | search_web, get_linkedin_profile, get_company_news, get_job_postings |
Gather external data for research |
draft_email, send_email, schedule_email, get_email_status |
Email composition and sending | |
| Internal | format_output, count_words, validate_rules, log_result |
Agent-internal helpers |
Tool category rules
- CRM write tools always gate on human approval. The agent proposes a CRM update. A human approves it. The tool executes it. Never auto-execute CRM writes. One bad batch update cascades through workflows, scoring, and reporting
- Enrichment tools return structured data.
enrich_companyreturns{ name, domain, employee_count, industry, funding_stage, funding_amount }. Not a paragraph of text. Structured data is easier for the agent to use correctly - Research tools set limits.
search_webreturns top 5 results with title, URL, and snippet. Not the full page content of 50 results. The agent's context window is finite - Internal tools don't call external services.
count_wordsandformat_outputare pure functions. No API calls, no side effects. These run instantly and never fail
Designing Tool Responses
Response structure
Every tool response should follow this pattern:
{
"success": true,
"data": { ... },
"metadata": {
"source": "hubspot",
"timestamp": "2025-01-15T10:30:00Z",
"result_count": 3,
"truncated": false
}
}
For errors:
{
"success": false,
"error": {
"code": "NOT_FOUND",
"message": "No contact found with email domain 'example.com'",
"suggestion": "Try searching by company name instead"
}
}
Response rules
- Always return structured data. JSON with named fields. Never raw text, HTML, or unprocessed API responses. The agent needs to extract specific values from the response. Named fields make this deterministic
- Include a success/error flag. The agent needs to know whether the tool call worked before planning its next step. A raw response that might be data or might be an error message forces the agent to guess
- Limit response size. If a tool can return 500 contacts, default to 10 and let the agent request more. Large responses burn context window and degrade agent performance
- Include metadata. Source, timestamp, result count, whether results were truncated. The agent can use this to decide whether to make another call
- Error suggestions help the agent recover. "No results found" is useless. "No results found. Try searching by company name instead of domain" gives the agent a recovery path
Tool Composition
How agents chain tool calls
Agent task: "Research Acme Corp and find the VP of Sales"
Step 1: search_hubspot_contacts(company_name="Acme Corp")
→ Returns 8 contacts
Step 2: Agent examines results. No VP of Sales found.
Step 3: enrich_company(domain="acme.com")
→ Returns company data including employee count
Step 4: get_linkedin_profile(company="Acme Corp", title="VP Sales")
→ Returns LinkedIn profile with name, title, current role
Step 5: create_contact(name="Jane Smith", title="VP Sales",
company="Acme Corp", source="linkedin")
→ Returns new contact ID
Composition rules
- Each tool does one thing.
search_and_enrich_contactis two tools jammed together. What if the search succeeds but enrichment fails? Split them. The agent chains them when it needs both - Tool outputs are tool inputs. The contact ID from
create_contactfeeds intolog_activity(contact_id=...). Design return values to be usable as inputs to other tools - No side effects in read tools.
search_contactsshould never create a log entry, trigger a webhook, or update a record. Read tools read. Write tools write. Mixing side effects makes behavior unpredictable - Idempotent where possible. Calling
update_contact(id, title="VP Sales")twice should produce the same result. Idempotent tools are safe to retry on failure
Tool Access Control
Which agents can call which tools
| Agent type | Read tools | Write tools | Enrichment tools | Send tools |
|---|---|---|---|---|
| Research agent | All CRM read | None | All enrichment | None |
| Scoring agent | Contact + company read | None | ICP fit tools | None |
| Email writer agent | Contact read (for context) | None | None | Draft only (no send) |
| Orchestrator agent | All read | CRM write (with approval) | All | Send (with approval) |
Access rules
- Principle of least privilege. Give each agent only the tools it needs. A research agent has no business sending emails. An email writer has no business updating CRM records
- Write tools require approval gates. Any tool that modifies external state (CRM, email, database) must have a human approval step or an explicit automation rule
- Separate draft from send. The email agent calls
draft_email. A separate approval step callssend_email. The agent never sends directly - Log every tool call. Agent ID, tool name, parameters, response, timestamp. This is your audit trail when something goes wrong
Testing Tools
What to test
| Test type | What it validates | How |
|---|---|---|
| Schema compliance | Does the tool accept valid parameters and reject invalid ones? | Pass valid and invalid parameter combinations |
| Response format | Does the tool return the documented response structure? | Check success responses and error responses |
| Agent usability | Does the agent call the tool correctly based on name + description alone? | Give the agent a task that requires the tool. Does it pick the right tool and pass correct args? |
| Error handling | Does the tool return useful errors? Does the agent recover? | Force errors (invalid IDs, network failures). Check agent behavior |
| Edge cases | Does the tool handle empty results, special characters, rate limits? | Pass empty queries, unicode, rapid sequential calls |
Testing rules
- Test with the agent, not just in isolation. A tool that works perfectly when called directly but confuses the agent is a bad tool. The ultimate test is: does the agent use it correctly?
- Test tool selection. Give the agent 5 tools and a task. Does it pick the right one? If it picks the wrong tool, the name or description needs improvement
- Test parameter filling. Does the agent pass the right arguments? If it passes
company_namewheredomainwas expected, the parameter descriptions are unclear - Test error recovery. Tool returns an error. Does the agent try again? Try a different approach? Give up gracefully? Or hallucinate a result?
Pre-Build Checklist
Before building a tool for an agent:
- [ ] Tool has a verb-noun name (e.g.,
search_contacts, notcontact_tool) - [ ] Description is one sentence stating what it does AND what it returns
- [ ] Parameters have clear descriptions and types
- [ ] Required parameters are truly required (tool fails without them)
- [ ] Optional parameters have defaults
- [ ] No more than 7 parameters
- [ ] Response is structured JSON with success/error flag
- [ ] Error responses include actionable suggestions
- [ ] Response size is bounded (pagination or limits)
- [ ] Read tools have no side effects
- [ ] Write tools have approval gates
- [ ] Tool tested with the actual agent (not just in isolation)
- [ ] Tool name doesn't overlap with other tool names
Anti-Pattern Check
- Tool named
handle_dataorprocess_input. The agent has no idea what this tool does. Use specific verb-noun names:search_contacts,enrich_company,validate_email. Every tool name should be unambiguous - All parameters marked as required. The agent hallucmates a "company_size" value because the tool requires it but the agent doesn't have it. Only require parameters the tool truly can't function without
- Tool returns raw API response. The agent gets a 200-line JSON blob from HubSpot's API. 190 lines are irrelevant. Process the response in the tool. Return only the fields the agent needs
- No error handling. Tool throws an exception. Agent receives a stack trace. It tries to extract useful information from the error message and fails. Return structured error objects with codes and suggestions
- One tool does five things.
manage_contacts(action="search|create|update|delete"). The agent struggles to use multi-action tools correctly. One tool, one action. Split it - Tool sends emails without approval. The email agent calls
send_emaildirectly. No human review. One hallucinated claim reaches the prospect. Separatedraft_emailfromsend_email. Require approval between them - 20 tools on one agent. The agent has too many options. It calls the wrong tool 30% of the time. Keep it to 5-8 tools per agent. If more are needed, split into multiple agents with focused tool sets
- No tool call logging. Something went wrong but you can't tell which tool call caused it. Log every call with agent ID, tool name, parameters, response, and timestamp