---
name: crm-data-hygiene
slug: crm-data-hygiene
description: This skill should be used when the user asks to "clean up our CRM", "fix CRM data quality", "audit CRM data", "run a CRM hygiene process", "deduplicate our CRM", "clean our HubSpot data", "clean our Salesforce data", "design a data hygiene process", "standardize CRM data", or any variation of cleaning, maintaining, and improving data quality in a B2B SaaS CRM system.
category: general
---

# CRM Data Hygiene

Dirty CRM data corrupts everything downstream: routing, scoring, attribution, forecasting, outbound, and reporting. Every workflow that touches CRM data inherits its quality problems. A 10% duplicate rate means 10% of your outbound is wasted. A 20% invalid-email rate means 20% of your sequences bounce. Data hygiene isn't a cleanup project. It's an ongoing operating discipline.

The principle: prevent bad data from entering the CRM, detect and fix bad data that exists, and maintain quality continuously. In that order. Prevention is 10x cheaper than remediation.

## The 5 Data Quality Dimensions

Every CRM hygiene program addresses these five dimensions.

| Dimension | Definition | Example of failure | Impact |
|-----------|-----------|-------------------|--------|
| Completeness | Required fields are populated | Contact has no email, no title, no company | Can't route, can't sequence, can't score |
| Accuracy | Field values are correct and current | Job title says "SDR" but the person is now VP Sales | Wrong messaging, wrong routing, wrong scoring |
| Consistency | Same data formatted the same way | "United States", "US", "USA", "U.S.", "united states" in country field | Broken filters, broken reports, broken routing |
| Uniqueness | No duplicates | Same person appears 3 times with slight name variations | Multiple reps emailing the same person, inflated pipeline |
| Timeliness | Data reflects current reality | Contact left the company 6 months ago, still in CRM as active | Wasted outreach, embarrassing emails to wrong people |

---

## Prevention: Stop Bad Data at the Gate

### Required fields on creation

Define minimum required fields for every object type. Block creation without them.

| Object | Required fields | Why |
|--------|----------------|-----|
| Contact | Email, first name, last name, company, lead source | Can't route or sequence without email. Can't report without source |
| Company | Name, domain, industry, employee count range | Can't score ICP fit without firmographics |
| Deal | Contact, company, amount, stage, close date, owner | Can't forecast without these |

**Prevention rules:**
- Enforce required fields in CRM settings, not in training. If a rep can create a contact without an email, they will. Make the system enforce it
- Don't require too many fields. 5-7 required fields per object is the limit. Beyond that, reps start entering garbage to bypass the form ("asdf" in title, "1" in phone)
- Use dropdown menus instead of free text for categorical fields (industry, country, lead source). Free text creates inconsistency. Dropdowns enforce it

### Standardization rules

Define how data should be formatted and enforce it with automation.

| Field | Standard | Enforcement |
|-------|----------|-------------|
| Country | ISO 2-letter code (US, GB, DE) | Dropdown + automation to normalize on import |
| Phone | E.164 format (+1-555-123-4567) | Validation rule or automation |
| Company name | Official name, no suffixes (not "Acme, Inc." or "Acme Corp") | Manual cleanup + import rules |
| Email | Lowercase, validated format | Automation on creation |
| Job title | Standardized to company conventions (VP vs Vice President) | Mapping table + automation |
| Industry | Predefined picklist matching your ICP segments | Dropdown, no free text |
| Revenue range | Predefined bands ($1-5M, $5-20M, $20-100M, $100M+) | Dropdown |
| Employee count | Predefined bands (1-50, 51-200, 201-1000, 1000+) | Dropdown |

### Import hygiene

Most data quality problems enter through bulk imports: list purchases, event attendee uploads, enrichment dumps.

**Import rules:**
- Every import must go through a staging process. Never import directly into production CRM
- Deduplicate against existing records before import. Match on email (primary), domain + name (secondary)
- Validate emails before import. Run through verification tool (NeverBounce, ZeroBounce). Reject unverifiable emails
- Map import fields to CRM fields explicitly. Never auto-map. Column names like "Name" could be first name, last name, or full name
- Log every import: date, source, record count, who imported. This is the audit trail for tracing data quality issues back to their source
- Set a "data source" or "import batch" field on every imported record. When an import turns out to be garbage, you can find and fix all affected records

---

## Detection: Find Bad Data That Exists

### Automated hygiene reports

Run these reports weekly or set them as automated dashboards.

| Report | What it finds | Query logic | Priority |
|--------|-------------|-------------|----------|
| Contacts missing email | Can't sequence or email | Email IS NULL | P0 |
| Contacts with invalid email format | Will bounce | Email NOT LIKE '%@%.%' | P0 |
| Contacts missing title | Can't score or route by persona | Job Title IS NULL | P1 |
| Contacts missing company | Can't route by account | Company IS NULL | P1 |
| Contacts with no activity in 12+ months | Likely stale or left company | Last Activity Date < 12 months ago | P1 |
| Companies missing industry | Can't segment or report by vertical | Industry IS NULL | P1 |
| Companies missing employee count | Can't score ICP fit | Employee Count IS NULL | P2 |
| Deals with close date in the past and stage not closed | Zombie deals | Close Date < TODAY AND Stage NOT IN (Closed Won, Closed Lost) | P0 |
| Deals with no activity in 30+ days | Stale pipeline | Last Activity Date < 30 days ago AND Stage NOT IN (Closed) | P1 |
| Duplicate contacts (same email) | Multiple records for same person | GROUP BY email HAVING COUNT > 1 | P0 |
| Duplicate companies (same domain) | Multiple records for same account | GROUP BY domain HAVING COUNT > 1 | P0 |

### Duplicate detection

Duplicates are the most damaging data quality issue. They cause:
- Multiple reps emailing the same prospect (embarrassing and unprofessional)
- Inflated pipeline (same deal counted twice)
- Broken attribution (touchpoints split across records)
- Wrong scoring (engagement signals diluted across duplicates)

**Duplicate matching rules:**

| Match type | Fields to compare | Confidence |
|-----------|------------------|------------|
| Exact email match | email = email | Definite duplicate. Auto-merge |
| Domain + name match | company domain + (first name + last name) | High confidence. Review before merge |
| Phone match | phone number (normalized) | High confidence. Review before merge |
| Fuzzy name + company | Similar name + same company | Medium confidence. Manual review required |
| Same name, different company | first name + last name match, company differs | Not a duplicate. Same person, different job. Update the record |

**Merge rules:**
- Keep the record with the most activity history. Losing activity data is worse than losing a field value
- Keep the most recent field values when merging. If Record A has title "Manager" from 2024 and Record B has "Director" from 2026, keep "Director"
- Never auto-merge fuzzy matches. Auto-merge on exact email only. Everything else needs human review
- Log every merge: which records merged, which survived, who approved. Undo capability if a merge was wrong

---

## Remediation: Fix Bad Data

### The hygiene sprint

A one-time cleanup to bring existing data to baseline quality. Run this before implementing ongoing processes.

| Phase | Duration | Focus | Actions |
|-------|----------|-------|---------|
| 1. Audit | Week 1 | Assess current state | Run all detection reports. Quantify the problem per dimension |
| 2. Deduplicate | Week 2-3 | Remove duplicates | Merge exact-email duplicates. Review and merge high-confidence fuzzy matches |
| 3. Enrich | Week 3-4 | Fill missing fields | Run contacts through enrichment (Apollo, Clearbit). Fill title, company, industry, size |
| 4. Validate | Week 4-5 | Verify accuracy | Validate emails. Verify phone numbers. Check for contacts who left their company |
| 5. Standardize | Week 5-6 | Normalize formats | Standardize country, industry, title formats. Apply picklist values to free-text fields |

**Sprint rules:**
- Don't try to fix everything at once. Prioritize by impact: duplicates first (they break routing and outbound), then missing emails (they break sequencing), then missing titles (they break scoring)
- Set a quality baseline before the sprint. "14% of contacts have no email, 8% are duplicates, 22% have no title." Measure again after the sprint to prove progress
- Don't delete records during the sprint unless they're clearly invalid (test records, spam entries, competitors). Archive, don't delete. Deleted data can't be recovered

### Ongoing hygiene cadence

After the sprint, maintain quality continuously.

| Activity | Frequency | Who | What |
|----------|-----------|-----|------|
| Hygiene dashboard review | Weekly | RevOps | Check automated reports for new issues |
| Duplicate scan | Weekly | RevOps (automated) | Flag new duplicates for review |
| Email verification | Monthly | RevOps | Re-verify emails for contacts in active sequences |
| Stale contact detection | Monthly | RevOps | Flag contacts with no activity in 6+ months |
| Job change detection | Monthly | RevOps (automated) | Check LinkedIn or enrichment tools for role changes |
| Data enrichment refresh | Quarterly | RevOps | Re-enrich all contacts to fill gaps and update fields |
| Full audit | Quarterly | RevOps | Comprehensive hygiene report across all dimensions |
| Import audit | Per import | RevOps | Review every bulk import before it hits production |

---

## Field-Level Hygiene Rules

### Email hygiene

| Issue | Detection | Fix |
|-------|-----------|-----|
| Invalid format | Regex validation | Fix or remove |
| Catch-all domain | Email verification tool flags | Keep but mark as unverified. Don't include in deliverability-sensitive campaigns |
| Role-based email (info@, sales@, support@) | Pattern match | Keep for company record. Don't use for personal outreach |
| Personal email (gmail, yahoo) for B2B contact | Domain check | Enrich to find work email. Keep personal as secondary |
| Bounced email | Bounce tracking from sequencing tool | Mark as invalid. Attempt re-enrichment |

### Title / role hygiene

| Issue | Detection | Fix |
|-------|-----------|-----|
| Missing title | IS NULL check | Enrich from LinkedIn or enrichment tool |
| Outdated title | Job change detection or enrichment refresh | Update to current title |
| Non-standard format | Pattern match ("VP" vs "Vice President", "Sr." vs "Senior") | Standardize with mapping table |
| Title doesn't indicate seniority | Can't score or route | Add a "seniority level" field: IC, Manager, Director, VP, C-level |

### Company hygiene

| Issue | Detection | Fix |
|-------|-----------|-----|
| Missing company | IS NULL check | Enrich from email domain or LinkedIn |
| Company name variations | "Acme", "Acme Inc", "Acme, Inc.", "ACME" | Standardize to official name. Match on domain |
| Missing domain | IS NULL check | Derive from email address or enrich |
| Acquired company | Name no longer exists | Update to acquiring company name. Note acquisition |
| Missing firmographics | Industry, size, revenue IS NULL | Enrich from Clearbit, Apollo, or LinkedIn |

---

## Automation Recipes

### Recipe 1: Contact standardization on creation

**Trigger:** New contact created
**Actions:**
1. Lowercase email
2. Standardize country (mapping table)
3. Standardize phone to E.164
4. Set "needs enrichment" flag if title, industry, or company size is empty
5. Run dedup check against existing contacts on email

### Recipe 2: Stale deal cleanup

**Trigger:** Weekly scheduled automation
**Conditions:** Deal close date < today AND stage not closed
**Actions:**
1. Notify deal owner via Slack/email: "Deal [name] has a close date in the past. Update the close date or close the deal"
2. If no action in 7 days, auto-move to "Stale" stage
3. If no action in 14 days, auto-close as "Closed Lost - Stale"

### Recipe 3: Contact left company detection

**Trigger:** Monthly enrichment refresh
**Conditions:** Enrichment returns different company for same email
**Actions:**
1. Flag contact as "Job Change Detected"
2. Update company if new company is in ICP
3. Notify account owner: "[Contact] appears to have moved to [New Company]"
4. If old company has no other contacts, flag account for review

### Recipe 4: Duplicate prevention on import

**Trigger:** Bulk import initiated
**Actions:**
1. Match import records against existing contacts on email
2. For matches: update existing record with new data (don't create duplicate)
3. For non-matches: create new contact with import batch tag
4. Log: records matched, records created, records rejected (invalid email)

---

## Measuring Data Quality

### Hygiene scorecard

Track monthly. Report to sales and marketing leadership.

| Metric | Target | Red flag |
|--------|--------|----------|
| % contacts with valid email | > 95% | < 90% |
| % contacts with title | > 90% | < 80% |
| % contacts with company | > 95% | < 90% |
| % companies with industry | > 85% | < 70% |
| % companies with employee count | > 80% | < 65% |
| Duplicate rate (contacts) | < 3% | > 5% |
| Duplicate rate (companies) | < 5% | > 8% |
| Stale contacts (no activity 12+ months) | < 20% of database | > 35% |
| Zombie deals (past close date, not closed) | < 5% of open pipeline | > 10% |
| Bounce rate on outbound email | < 3% | > 5% |

### Quality trend tracking

Plot these monthly to see whether quality is improving or degrading:

- Total records vs clean records (completeness trend)
- New duplicates created per month (prevention effectiveness)
- Records enriched per month (enrichment coverage)
- Bounce rate trend (email accuracy over time)
- Import rejection rate (import quality over time)

---

## Anti-Pattern Check

- No required fields on contact creation. If reps can create contacts with just a name and nothing else, they will. Enforce email, company, and lead source as minimums
- Free-text fields for categorical data. Country, industry, lead source, and seniority should be dropdown menus. Free text creates infinite variations of the same value
- Cleaning data once and calling it done. Data decays at 2-3% per month (people change jobs, companies rebrand, emails become invalid). Hygiene is a continuous process, not a project
- Auto-merging fuzzy matches. Fuzzy name matching produces false positives. "John Smith at Acme" and "John Smith at Beta Corp" are different people. Only auto-merge on exact email match
- Deleting records instead of archiving. Deleted records lose activity history, attribution data, and relationship context. Archive to a "disqualified" or "inactive" lifecycle stage instead
- No import process. Bulk importing a purchased list directly into production CRM without dedup, verification, or field mapping introduces thousands of quality issues in one action
- No hygiene owner. If nobody is responsible for data quality, nobody maintains it. Assign one person (usually RevOps) as the hygiene owner with a weekly review cadence
- Measuring hygiene by record count instead of quality. "We have 50,000 contacts" means nothing if 15,000 are duplicates and 10,000 have no email. Measure clean records, not total records