skill-design-anthropic-format
Skill Design - Anthropic Format
A skill is a markdown file (SKILL.md) that teaches Claude how to do a specific job. It transforms Claude from a general-purpose model into a specialist with procedural knowledge, hard rules, and domain-specific judgment. Skills are the most efficient way to embed repeatable expertise into an AI workflow.
The design principle: write the skill as if you're onboarding a sharp new hire who knows nothing about your domain. Be explicit about what good looks like, what's banned, and how to make judgment calls. Show concrete examples. State rules as absolutes, not suggestions.
Skill Anatomy
Every skill is a directory containing a required SKILL.md file and optional supporting files.
skill-name/
├── SKILL.md (required - the operating manual)
├── references/ (optional - detailed docs loaded on demand)
│ ├── patterns.md
│ └── examples.md
├── scripts/ (optional - executable utilities)
│ └── validate.sh
└── assets/ (optional - files used in output)
└── template.html
SKILL.md Structure
Every SKILL.md has two parts: YAML frontmatter (metadata) and markdown body (instructions).
---
name: skill-slug-name
description: This skill should be used when the user asks to "trigger phrase 1",
"trigger phrase 2", "trigger phrase 3", "trigger phrase 4". Provides [what it
provides] for [audience].
version: 0.1.0
---
# Skill Title
[Body: instructions, rules, examples, tables, checklists]
Frontmatter Design
The frontmatter determines when Claude loads the skill. A weak frontmatter means the skill never triggers. A vague frontmatter means it triggers on the wrong queries.
Required fields
| Field | Purpose | Format |
|---|---|---|
| name | Unique identifier, matches the directory name | Lowercase slug with hyphens: cold-email-subject-lines |
| description | Tells Claude when to use this skill. This is the trigger mechanism | Third-person, starts with "This skill should be used when the user asks to..." |
| version | Tracks iterations | Semantic versioning: 0.1.0 for drafts, 1.0.0 for production |
Description rules
The description is the most important line in the entire skill. Claude reads every skill's description on every conversation to decide which skills to load. A bad description = a skill that never fires.
Formula:
This skill should be used when the user asks to "[exact phrase 1]",
"[exact phrase 2]", "[exact phrase 3]", "[exact phrase 4]",
"[exact phrase 5]", "[exact phrase 6]", or any variation of
[general category description].
Rules:
- Start with "This skill should be used when the user asks to..." (third person, always)
- Include 6-10 specific trigger phrases in quotes. These are the exact things a user would type
- End with a catch-all: "or any variation of [category]"
- Trigger phrases should cover different phrasings of the same intent. "Write cold outbound", "draft a cold email", "create a sales sequence" all mean the same thing but users phrase them differently
- Include tool-specific triggers if relevant. "Build a Lemlist sequence", "set up an Outreach cadence" trigger differently than generic "write a sequence"
Good description:
description: This skill should be used when the user asks to "write cold outbound",
"draft a cold email sequence", "write a sales sequence", "write outbound copy",
"build a Lemlist sequence", "build an Outreach sequence", "write B2B cold emails",
"draft a 3-email sequence", or any variation of writing outbound email copy for
a B2B SaaS audience.
Bad descriptions:
description: Helps with cold email.
# Too vague. No trigger phrases. Won't fire reliably.
description: Use this skill when working with outbound.
# Wrong person (imperative, not third person). Too broad.
description: Provides guidance for email writing and sales sequences.
# No trigger phrases. "Guidance" is generic. Doesn't say when to fire.
Body Design
The body is the operating manual Claude loads when the skill triggers. It should be dense, opinionated, and actionable. Not a textbook. Not a blog post. A set of instructions a smart person can follow without asking clarifying questions.
Writing style
- Imperative form. "Write the subject line in lowercase" not "You should write the subject line in lowercase." Verb-first instructions
- Terse. One idea per sentence. Short sentences. Cut filler words
- Opinionated. "Never use em-dashes" not "Consider avoiding em-dashes." Skills encode judgment. Hedging defeats the purpose
- Concrete over abstract. "Subject lines ≤ 5 words, lowercase, no emoji" beats "Keep subject lines short and professional"
- Tables over paragraphs. When presenting options, comparisons, or criteria, use tables. They're denser and more scannable than prose
Body structure template
# Skill Title
[1-3 sentence overview. What this skill does and the quality bar.]
## [Core framework / methodology]
[The main model or process. Tables, steps, rules.]
---
## [Section 2: Detailed rules or component 1]
[Hard rules, good/bad examples, specific constraints.]
---
## [Section 3: Detailed rules or component 2]
[More rules, patterns, templates.]
---
## [Banned patterns / phrases / anti-patterns]
[Explicit list of what to never do.]
---
## Pre-Send / Pre-Use Checklist
[Checkbox list of quality gates.]
---
## Anti-Pattern Check
[Common mistakes with specific fixes.]
Length targets
| Skill complexity | SKILL.md target | References needed? |
|---|---|---|
| Simple (one concept, few rules) | 800-1,200 words | No |
| Standard (framework + rules + examples) | 1,500-2,500 words | Optional |
| Complex (multi-step process, many rules, multiple scenarios) | 2,000-3,000 words | Yes, move detail to references/ |
| Deep domain (comprehensive playbook) | 2,500-3,500 words in SKILL.md + references | Yes, essential |
Hard cap: 5,000 words in SKILL.md. Beyond that, the skill consumes too much context when loaded. Move detailed reference material, extended examples, and edge cases to references/ files.
Rules Design
Rules are the highest-value content in a skill. They encode the judgment that separates good output from bad output. Without rules, the skill is just a topic overview Claude could generate on its own.
Rule formatting
- Group by category. Structure rules, content rules, format rules, behavioral rules. Grouping makes them scannable
- Use "never" and "always" for absolutes. "Never start an email with 'I'" is clear. "Try to avoid starting with 'I'" is a suggestion the model will ignore 30% of the time
- Pair the rule with the reason. "No em-dashes in body copy. Reads as AI-generated." The reason helps Claude apply the rule in edge cases
- Pair the rule with the fix. "Never use 'leveraging.' Use: using, running, building." Don't just ban. Provide the replacement
- Keep to 10-20 rules per skill. Below 10, the skill probably isn't opinionated enough. Above 20, the model starts dropping rules. If you have 30 rules, split into categories and move the less critical ones to a reference file
Rule types
| Type | Example | Format |
|---|---|---|
| Hard constraint | "Every email must be under 80 words" | Binary. Checkable. Non-negotiable |
| Banned content | "Never use 'leveraging', 'synergies', 'unlock'" | Explicit list with alternatives |
| Required element | "Email 1 must contain a specific, verifiable signal" | Positive requirement |
| Format rule | "Subject lines ≤ 5 words, lowercase, no emoji" | Measurable spec |
| Judgment rule | "If you can't name the trigger, don't send" | Conditional with clear threshold |
| Quality bar | "Could this email apply to 100 other people? Rewrite" | Self-check question |
Examples and Templates
Examples are the most powerful teaching tool in a skill. Two good examples teach Claude more than 500 words of rules.
Good/bad example pairs
Always pair good examples with bad examples. The contrast teaches both what to do and what to avoid.
**Good signals:**
- "Saw you posted for a RevOps lead last week"
- "Congrats on the Series B"
**Bad signals (never use):**
- "I was doing some research on your company..."
- "I came across your LinkedIn profile..."
Template patterns
When the skill produces structured output, include templates.
**Templates (lightly customize, don't use verbatim):**
- "Sounds like timing's off. Should I close the loop?"
- "No worries if this isn't a priority right now."
Template rules:
- Label templates as starting points, not scripts. "Lightly customize, don't use verbatim" prevents robotic output
- Include 2-3 template variants. One template creates uniformity. Three create range
- Show the template in the same format as the expected output. If the output is plain text email, show the template as plain text. Not markdown. Not code blocks
Tables
Tables are the most efficient format for encoding rules, comparisons, criteria, and frameworks into skills. Use tables over prose whenever the content has a repeating structure.
When to use tables
| Content type | Use table? | Why |
|---|---|---|
| Comparing options (A vs B vs C) | Yes | Dense, scannable |
| Step-by-step process | Yes (if ≤ 5 columns) or numbered list | Tables for parallel info, lists for sequential |
| Rules with categories | Yes | Group by type, show rule + reason + fix |
| Scoring criteria | Yes | Dimensions × levels |
| Timelines / cadences | Yes | Day × channel × action |
| Metrics and targets | Yes | Metric × target × action-if-below |
Table design rules
- ≤ 6 columns. More than 6 becomes unreadable in markdown
- Short cell values. 1-10 words per cell. Long prose in table cells defeats the purpose
- Header row always. Use descriptive headers, not abbreviations
- Align content type. Don't mix numbers and paragraphs in the same column
Checklists
End every skill with a pre-use checklist and an anti-pattern check. These are the quality gates.
Pre-use checklist
A yes/no list the model runs before finalizing output. Use markdown checkbox syntax.
## Pre-Send Checklist
- [ ] Email 1 has a specific, real, verifiable signal
- [ ] Email 1 has no calendar link
- [ ] No banned phrases in any email
- [ ] Each email is under its word limit
Checklist rules:
- Each item must be binary (yes/no, present/absent). "Is the email good?" is not checkable. "Is the email under 80 words?" is
- 8-15 items. Fewer than 8 misses important checks. More than 15 creates checklist fatigue
- Order by importance. Most critical checks first
Anti-pattern check
A list of common mistakes with specific fixes. Framed as if-then.
## Anti-Pattern Check
- Could this email apply to 100 other people unchanged? → Rewrite the signal
- Did you ask for more than 15 minutes? → Reduce
- Is the proof point vague? → Add a number or a company name
Anti-pattern rules:
- Frame as a question the model asks itself, followed by the fix
- 5-8 anti-patterns. Cover the mistakes that actually happen, not theoretical risks
- The fix should be actionable in one sentence
Progressive Disclosure
Skills use a three-level loading system to manage Claude's context window efficiently.
| Level | What loads | When | Size target |
|---|---|---|---|
| 1. Metadata | name + description from frontmatter | Always (every conversation) | ~50-100 words |
| 2. SKILL.md body | Full markdown body | When skill triggers | 1,500-3,000 words |
| 3. References | Files in references/ | When Claude decides it needs more detail | Unlimited per file |
What goes where
| Content type | SKILL.md body | references/ |
|---|---|---|
| Core framework / methodology | Yes | No |
| Essential rules (top 15-20) | Yes | No |
| Good/bad examples (2-3 pairs) | Yes | No |
| Pre-use checklist | Yes | No |
| Anti-pattern check | Yes | No |
| Extended example library (10+ examples) | No | Yes |
| Detailed scoring rubrics | No | Yes |
| Full banned-phrase lists (30+ items) | No | Yes |
| Tool-specific configuration guides | No | Yes |
| Historical benchmarks or data tables | No | Yes |
| Edge case catalog | No | Yes |
Referencing supporting files
Always reference files in SKILL.md so Claude knows they exist.
## Additional Resources
### Reference Files
- **`references/examples.md`** - Extended library of 20+ email examples by industry
- **`references/banned-phrases.md`** - Complete banned phrase list with alternatives
### Scripts
- **`scripts/word-count-check.sh`** - Validates word counts across a sequence
Reference rules:
- Include a one-line description of each file. Claude uses this to decide whether to load it
- Only create reference files when SKILL.md exceeds 3,000 words or when the reference material is genuinely detailed (scoring rubrics, extended example libraries, API documentation)
- Don't split for the sake of splitting. A 1,800-word skill doesn't need reference files. Everything fits in SKILL.md
Skill Validation Checklist
Before finalizing any skill:
Frontmatter:
- [ ]
namematches the directory name (lowercase, hyphenated) - [ ]
descriptionstarts with "This skill should be used when the user asks to..." - [ ]
descriptionincludes 6-10 specific trigger phrases in quotes - [ ]
descriptionends with a catch-all ("or any variation of...") - [ ]
versionis set (0.1.0 for drafts)
Body:
- [ ] Written in imperative form (verb-first, not "you should")
- [ ] SKILL.md body is 1,500-3,000 words (under 5,000 hard cap)
- [ ] Includes at least one table
- [ ] Includes good/bad example pairs
- [ ] Rules use "never"/"always", not "try to"/"consider"
- [ ] Rules are grouped by category
- [ ] Ends with a pre-use checklist (8-15 binary items)
- [ ] Ends with an anti-pattern check (5-8 items with fixes)
Structure:
- [ ] Detailed content moved to references/ if SKILL.md > 3,000 words
- [ ] All referenced files actually exist
- [ ] No duplicate information between SKILL.md and reference files
Quality:
- [ ] Skill is opinionated (encodes judgment, not just information)
- [ ] Skill contains information Claude wouldn't know without it
- [ ] A person reading the skill could execute the task without asking questions
Anti-Pattern Check
- Description has no trigger phrases. Without quoted phrases like "write cold outbound" the skill won't fire when users ask for it. Include 6-10 specific phrases
- Description uses second person. "Use this skill when you need..." is wrong. "This skill should be used when the user asks to..." is correct. Third person, always
- Body reads like a blog post. "Cold email is an important part of any outbound strategy..." is filler. Cut the preamble. Start with the framework or the first rule
- Rules say "try to" or "consider." Soft rules produce inconsistent output. Make every rule binary. "Never" or remove it
- No examples. Rules without examples are abstract. Two good/bad pairs teach more than 10 rules alone
- SKILL.md is 6,000 words. Too long. The skill consumes excessive context when loaded. Move detail to references/. Keep SKILL.md under 3,000 words ideally, 5,000 absolute max
- No checklist or anti-pattern section. These are the quality gates. Without them, the skill describes what to do but doesn't help Claude verify it did it correctly
- Skill contains only information Claude already knows. A skill about "what is cold email" adds nothing. A skill about "the 12 banned phrases that mark an email as AI-generated" adds real value. Skills should encode non-obvious, opinionated, procedural knowledge