general crm-data-hygiene

crm-data-hygiene

This skill should be used when the user asks to "clean up our CRM", "fix CRM data quality", "audit CRM data", "run a CRM hygiene process", "deduplicate our CRM", "clean our HubSpot data", "clean our Salesforce data", "design a data hygiene process", "standardize CRM data", or any variation of cleaning, maintaining, and improving data quality in a B2B SaaS CRM system.
Download .md

CRM Data Hygiene

Dirty CRM data corrupts everything downstream: routing, scoring, attribution, forecasting, outbound, and reporting. Every workflow that touches CRM data inherits its quality problems. A 10% duplicate rate means 10% of your outbound is wasted. A 20% invalid-email rate means 20% of your sequences bounce. Data hygiene isn't a cleanup project. It's an ongoing operating discipline.

The principle: prevent bad data from entering the CRM, detect and fix bad data that exists, and maintain quality continuously. In that order. Prevention is 10x cheaper than remediation.

The 5 Data Quality Dimensions

Every CRM hygiene program addresses these five dimensions.

Dimension Definition Example of failure Impact
Completeness Required fields are populated Contact has no email, no title, no company Can't route, can't sequence, can't score
Accuracy Field values are correct and current Job title says "SDR" but the person is now VP Sales Wrong messaging, wrong routing, wrong scoring
Consistency Same data formatted the same way "United States", "US", "USA", "U.S.", "united states" in country field Broken filters, broken reports, broken routing
Uniqueness No duplicates Same person appears 3 times with slight name variations Multiple reps emailing the same person, inflated pipeline
Timeliness Data reflects current reality Contact left the company 6 months ago, still in CRM as active Wasted outreach, embarrassing emails to wrong people

Prevention: Stop Bad Data at the Gate

Required fields on creation

Define minimum required fields for every object type. Block creation without them.

Object Required fields Why
Contact Email, first name, last name, company, lead source Can't route or sequence without email. Can't report without source
Company Name, domain, industry, employee count range Can't score ICP fit without firmographics
Deal Contact, company, amount, stage, close date, owner Can't forecast without these

Prevention rules:

  • Enforce required fields in CRM settings, not in training. If a rep can create a contact without an email, they will. Make the system enforce it
  • Don't require too many fields. 5-7 required fields per object is the limit. Beyond that, reps start entering garbage to bypass the form ("asdf" in title, "1" in phone)
  • Use dropdown menus instead of free text for categorical fields (industry, country, lead source). Free text creates inconsistency. Dropdowns enforce it

Standardization rules

Define how data should be formatted and enforce it with automation.

Field Standard Enforcement
Country ISO 2-letter code (US, GB, DE) Dropdown + automation to normalize on import
Phone E.164 format (+1-555-123-4567) Validation rule or automation
Company name Official name, no suffixes (not "Acme, Inc." or "Acme Corp") Manual cleanup + import rules
Email Lowercase, validated format Automation on creation
Job title Standardized to company conventions (VP vs Vice President) Mapping table + automation
Industry Predefined picklist matching your ICP segments Dropdown, no free text
Revenue range Predefined bands ($1-5M, $5-20M, $20-100M, $100M+) Dropdown
Employee count Predefined bands (1-50, 51-200, 201-1000, 1000+) Dropdown

Import hygiene

Most data quality problems enter through bulk imports: list purchases, event attendee uploads, enrichment dumps.

Import rules:

  • Every import must go through a staging process. Never import directly into production CRM
  • Deduplicate against existing records before import. Match on email (primary), domain + name (secondary)
  • Validate emails before import. Run through verification tool (NeverBounce, ZeroBounce). Reject unverifiable emails
  • Map import fields to CRM fields explicitly. Never auto-map. Column names like "Name" could be first name, last name, or full name
  • Log every import: date, source, record count, who imported. This is the audit trail for tracing data quality issues back to their source
  • Set a "data source" or "import batch" field on every imported record. When an import turns out to be garbage, you can find and fix all affected records

Detection: Find Bad Data That Exists

Automated hygiene reports

Run these reports weekly or set them as automated dashboards.

Report What it finds Query logic Priority
Contacts missing email Can't sequence or email Email IS NULL P0
Contacts with invalid email format Will bounce Email NOT LIKE '%@%.%' P0
Contacts missing title Can't score or route by persona Job Title IS NULL P1
Contacts missing company Can't route by account Company IS NULL P1
Contacts with no activity in 12+ months Likely stale or left company Last Activity Date < 12 months ago P1
Companies missing industry Can't segment or report by vertical Industry IS NULL P1
Companies missing employee count Can't score ICP fit Employee Count IS NULL P2
Deals with close date in the past and stage not closed Zombie deals Close Date < TODAY AND Stage NOT IN (Closed Won, Closed Lost) P0
Deals with no activity in 30+ days Stale pipeline Last Activity Date < 30 days ago AND Stage NOT IN (Closed) P1
Duplicate contacts (same email) Multiple records for same person GROUP BY email HAVING COUNT > 1 P0
Duplicate companies (same domain) Multiple records for same account GROUP BY domain HAVING COUNT > 1 P0

Duplicate detection

Duplicates are the most damaging data quality issue. They cause:

  • Multiple reps emailing the same prospect (embarrassing and unprofessional)
  • Inflated pipeline (same deal counted twice)
  • Broken attribution (touchpoints split across records)
  • Wrong scoring (engagement signals diluted across duplicates)

Duplicate matching rules:

Match type Fields to compare Confidence
Exact email match email = email Definite duplicate. Auto-merge
Domain + name match company domain + (first name + last name) High confidence. Review before merge
Phone match phone number (normalized) High confidence. Review before merge
Fuzzy name + company Similar name + same company Medium confidence. Manual review required
Same name, different company first name + last name match, company differs Not a duplicate. Same person, different job. Update the record

Merge rules:

  • Keep the record with the most activity history. Losing activity data is worse than losing a field value
  • Keep the most recent field values when merging. If Record A has title "Manager" from 2024 and Record B has "Director" from 2026, keep "Director"
  • Never auto-merge fuzzy matches. Auto-merge on exact email only. Everything else needs human review
  • Log every merge: which records merged, which survived, who approved. Undo capability if a merge was wrong

Remediation: Fix Bad Data

The hygiene sprint

A one-time cleanup to bring existing data to baseline quality. Run this before implementing ongoing processes.

Phase Duration Focus Actions
1. Audit Week 1 Assess current state Run all detection reports. Quantify the problem per dimension
2. Deduplicate Week 2-3 Remove duplicates Merge exact-email duplicates. Review and merge high-confidence fuzzy matches
3. Enrich Week 3-4 Fill missing fields Run contacts through enrichment (Apollo, Clearbit). Fill title, company, industry, size
4. Validate Week 4-5 Verify accuracy Validate emails. Verify phone numbers. Check for contacts who left their company
5. Standardize Week 5-6 Normalize formats Standardize country, industry, title formats. Apply picklist values to free-text fields

Sprint rules:

  • Don't try to fix everything at once. Prioritize by impact: duplicates first (they break routing and outbound), then missing emails (they break sequencing), then missing titles (they break scoring)
  • Set a quality baseline before the sprint. "14% of contacts have no email, 8% are duplicates, 22% have no title." Measure again after the sprint to prove progress
  • Don't delete records during the sprint unless they're clearly invalid (test records, spam entries, competitors). Archive, don't delete. Deleted data can't be recovered

Ongoing hygiene cadence

After the sprint, maintain quality continuously.

Activity Frequency Who What
Hygiene dashboard review Weekly RevOps Check automated reports for new issues
Duplicate scan Weekly RevOps (automated) Flag new duplicates for review
Email verification Monthly RevOps Re-verify emails for contacts in active sequences
Stale contact detection Monthly RevOps Flag contacts with no activity in 6+ months
Job change detection Monthly RevOps (automated) Check LinkedIn or enrichment tools for role changes
Data enrichment refresh Quarterly RevOps Re-enrich all contacts to fill gaps and update fields
Full audit Quarterly RevOps Comprehensive hygiene report across all dimensions
Import audit Per import RevOps Review every bulk import before it hits production

Field-Level Hygiene Rules

Email hygiene

Issue Detection Fix
Invalid format Regex validation Fix or remove
Catch-all domain Email verification tool flags Keep but mark as unverified. Don't include in deliverability-sensitive campaigns
Role-based email (info@, sales@, support@) Pattern match Keep for company record. Don't use for personal outreach
Personal email (gmail, yahoo) for B2B contact Domain check Enrich to find work email. Keep personal as secondary
Bounced email Bounce tracking from sequencing tool Mark as invalid. Attempt re-enrichment

Title / role hygiene

Issue Detection Fix
Missing title IS NULL check Enrich from LinkedIn or enrichment tool
Outdated title Job change detection or enrichment refresh Update to current title
Non-standard format Pattern match ("VP" vs "Vice President", "Sr." vs "Senior") Standardize with mapping table
Title doesn't indicate seniority Can't score or route Add a "seniority level" field: IC, Manager, Director, VP, C-level

Company hygiene

Issue Detection Fix
Missing company IS NULL check Enrich from email domain or LinkedIn
Company name variations "Acme", "Acme Inc", "Acme, Inc.", "ACME" Standardize to official name. Match on domain
Missing domain IS NULL check Derive from email address or enrich
Acquired company Name no longer exists Update to acquiring company name. Note acquisition
Missing firmographics Industry, size, revenue IS NULL Enrich from Clearbit, Apollo, or LinkedIn

Automation Recipes

Recipe 1: Contact standardization on creation

Trigger: New contact created Actions:

  1. Lowercase email
  2. Standardize country (mapping table)
  3. Standardize phone to E.164
  4. Set "needs enrichment" flag if title, industry, or company size is empty
  5. Run dedup check against existing contacts on email

Recipe 2: Stale deal cleanup

Trigger: Weekly scheduled automation Conditions: Deal close date < today AND stage not closed Actions:

  1. Notify deal owner via Slack/email: "Deal [name] has a close date in the past. Update the close date or close the deal"
  2. If no action in 7 days, auto-move to "Stale" stage
  3. If no action in 14 days, auto-close as "Closed Lost - Stale"

Recipe 3: Contact left company detection

Trigger: Monthly enrichment refresh Conditions: Enrichment returns different company for same email Actions:

  1. Flag contact as "Job Change Detected"
  2. Update company if new company is in ICP
  3. Notify account owner: "[Contact] appears to have moved to [New Company]"
  4. If old company has no other contacts, flag account for review

Recipe 4: Duplicate prevention on import

Trigger: Bulk import initiated Actions:

  1. Match import records against existing contacts on email
  2. For matches: update existing record with new data (don't create duplicate)
  3. For non-matches: create new contact with import batch tag
  4. Log: records matched, records created, records rejected (invalid email)

Measuring Data Quality

Hygiene scorecard

Track monthly. Report to sales and marketing leadership.

Metric Target Red flag
% contacts with valid email > 95% < 90%
% contacts with title > 90% < 80%
% contacts with company > 95% < 90%
% companies with industry > 85% < 70%
% companies with employee count > 80% < 65%
Duplicate rate (contacts) < 3% > 5%
Duplicate rate (companies) < 5% > 8%
Stale contacts (no activity 12+ months) < 20% of database > 35%
Zombie deals (past close date, not closed) < 5% of open pipeline > 10%
Bounce rate on outbound email < 3% > 5%

Quality trend tracking

Plot these monthly to see whether quality is improving or degrading:

  • Total records vs clean records (completeness trend)
  • New duplicates created per month (prevention effectiveness)
  • Records enriched per month (enrichment coverage)
  • Bounce rate trend (email accuracy over time)
  • Import rejection rate (import quality over time)

Anti-Pattern Check

  • No required fields on contact creation. If reps can create contacts with just a name and nothing else, they will. Enforce email, company, and lead source as minimums
  • Free-text fields for categorical data. Country, industry, lead source, and seniority should be dropdown menus. Free text creates infinite variations of the same value
  • Cleaning data once and calling it done. Data decays at 2-3% per month (people change jobs, companies rebrand, emails become invalid). Hygiene is a continuous process, not a project
  • Auto-merging fuzzy matches. Fuzzy name matching produces false positives. "John Smith at Acme" and "John Smith at Beta Corp" are different people. Only auto-merge on exact email match
  • Deleting records instead of archiving. Deleted records lose activity history, attribution data, and relationship context. Archive to a "disqualified" or "inactive" lifecycle stage instead
  • No import process. Bulk importing a purchased list directly into production CRM without dedup, verification, or field mapping introduces thousands of quality issues in one action
  • No hygiene owner. If nobody is responsible for data quality, nobody maintains it. Assign one person (usually RevOps) as the hygiene owner with a weekly review cadence
  • Measuring hygiene by record count instead of quality. "We have 50,000 contacts" means nothing if 15,000 are duplicates and 10,000 have no email. Measure clean records, not total records
Want agents that use skill files like this?
We customize skill files for your brand voice and methodology, then run content agents against them.
Book a call