---
name: crm-dedup-strategy
slug: crm-dedup-strategy
description: This skill should be used when the user asks to "deduplicate CRM records", "remove duplicate contacts", "CRM dedup strategy", "merge duplicate records", "clean up duplicate accounts", "deduplication best practices", "find and merge CRM duplicates", "prevent duplicate records in CRM", "CRM data dedup process", or any variation of deduplicating contacts, companies, or deals in a CRM for B2B SaaS.
category: general
---

# CRM Dedup Strategy

Deduplication is the process of finding and merging duplicate records in your CRM. Duplicates inflate reporting, split engagement history across records, cause double-outreach, and make pipeline data unreliable. A database with 10% duplicates means 10% of your contacts are getting fragmented experiences.

The principle: deduplication is a three-part problem. Find duplicates (detection), merge them correctly (resolution), and stop new ones from entering (prevention). Most teams only do detection and ignore prevention. That's cleaning the floor while the faucet is running.

## The Dedup Framework

### Three phases

| Phase | What it does | When to run |
|-------|-------------|-------------|
| Detection | Find existing duplicate records | Monthly + on-demand |
| Resolution | Merge or purge duplicates correctly | After detection |
| Prevention | Stop new duplicates from entering the CRM | Always-on automation |

---

## Phase 1: Detection

### Matching strategies

| Match type | How it works | Accuracy | Speed |
|-----------|-------------|----------|-------|
| Exact email match | Two records with identical email addresses | Very high | Fast |
| Domain + name | Same email domain + similar first/last name | High | Medium |
| Company name fuzzy | "Acme Inc" matches "Acme, Inc." matches "ACME" | Medium | Slow |
| Phone number | Exact phone match (normalized) | High | Fast |
| Name + company | Same name at same company | Medium | Medium |

### Detection rules

- **Start with exact email match.** This catches the most common duplicates with zero false positives. Two contacts with john@acme.com are definitely duplicates
- **Normalize before matching.** Lowercase emails, strip phone formatting, standardize company names. "john@ACME.com" and "john@acme.com" are the same email
- **Fuzzy company matching requires human review.** "Acme Inc" and "Acme Corporation" are probably the same company. "Acme Software" and "Acme Insurance" are not. Fuzzy matches need human confirmation
- **Match across objects.** A Lead with john@acme.com and a Contact with john@acme.com are duplicates that span the lead-contact boundary. Check both objects

### Detection tools

| Tool | CRM | How it works |
|------|-----|-------------|
| HubSpot built-in dedupe | HubSpot | Suggests duplicates based on email, name, phone |
| Salesforce Duplicate Rules | Salesforce | Configurable matching rules with alerts or blocks |
| Dedupely | Both | Third-party tool for bulk dedup |
| Insycle | HubSpot | Data management platform with dedup features |
| Custom script | Both | API-based matching for complex logic |

---

## Phase 2: Resolution

### Merge vs delete

| Action | When to use | Risk |
|--------|------------|------|
| Merge | Always preferred. Combines data from both records into one | Must choose the right surviving record |
| Delete | Only when one record has zero value (test data, spam) | Permanent data loss. No undo |

### Merge rules

- **Always merge, never delete.** Deleting a duplicate loses its activity history, form submissions, and deal associations. Merging preserves everything on the surviving record
- **The surviving record has the most data.** The record with more activities, more populated fields, and more recent engagement should be the master. The other record's data fills gaps
- **Review conflicts before merging.** If record A has title "VP Sales" and record B has title "Director of Sales," decide which is correct before merging. Don't blindly overwrite
- **Merge in bulk carefully.** Bulk merging 500 records without review will merge some incorrectly. Review all fuzzy matches individually. Only auto-merge exact email matches

### Merge priority matrix

| Field | Keep from | Why |
|-------|----------|-----|
| Email | Most recently verified | Verified email is more likely to be current |
| Title | Most recently updated | Titles change. The latest is most accurate |
| Phone | Most recently updated | Phone numbers change less often but still degrade |
| Company | Record with deal history | The company association linked to pipeline matters more |
| Lead Source | First created record | Original source attribution should be preserved |
| Activity history | Both (merged automatically) | All activities consolidate on the surviving record |
| Deal associations | Both (merged automatically) | All deals consolidate on the surviving record |

---

## Phase 3: Prevention

### Preventing new duplicates

| Prevention method | How it works | CRM support |
|------------------|-------------|-------------|
| Duplicate blocking rules | Block creation of a record that matches an existing one | Salesforce (native), HubSpot (limited) |
| Form dedup | When a form is submitted, match against existing records instead of creating new | HubSpot (native on email match), Salesforce (Web-to-Lead matching) |
| Import dedup | Deduplicate import files against existing records | Both (must be configured) |
| Integration dedup | Ensure API integrations check for existing records before creating new | Custom development |
| Lead-to-contact matching | Convert leads that match existing contacts instead of creating duplicates | Salesforce (lead conversion), HubSpot (auto-association) |

### Prevention rules

- **Match on email before creating.** Every new record entry point (form, import, API, manual) should check if the email already exists. If it does, update the existing record. Don't create a new one
- **Blocking rules for manual creation.** In Salesforce, set up Duplicate Rules that warn or block when a user creates a record matching an existing one. In HubSpot, enable duplicate detection in settings
- **Import dedup is mandatory.** Never import a CSV without deduplicating against the existing database first. Every import is a potential source of 5-15% new duplicates
- **API integrations must upsert, not insert.** When your enrichment tool or sequencing tool syncs data, it should update existing records (upsert), not create new ones (insert). Configure the integration correctly

---

## Dedup by Object

### Contact dedup

```
Priority 1: Exact email match
  → Auto-merge (safe, no false positives)

Priority 2: Same email domain + same first name + same last name
  → High confidence. Auto-merge after spot-check

Priority 3: Same email domain + similar name (Levenshtein ≤ 2)
  → Medium confidence. Human review required

Priority 4: Same company + same name (no email match)
  → Lower confidence. May be different people. Human review
```

### Company/Account dedup

```
Priority 1: Exact domain match
  → acme.com and acme.com are definitely the same company

Priority 2: Domain variations
  → acme.com and acme.io may be the same company. Verify

Priority 3: Fuzzy company name
  → "Acme Inc" and "Acme Corporation." Probably the same. Verify
  → "Acme Software" and "Acme Insurance." Different companies

Priority 4: Subsidiaries
  → "Acme EMEA" and "Acme Corp." Related but may need
     separate records. Business decision, not a dedup decision
```

---

## Ongoing Cadence

| Task | Frequency | Method | Owner |
|------|-----------|--------|-------|
| Exact email dedup scan | Monthly | Automated (CRM tool or script) | RevOps |
| Fuzzy match review | Monthly | Tool generates candidates, human reviews | RevOps |
| Import dedup check | Every import | Pre-import matching | Whoever imports |
| Integration audit | Quarterly | Verify integrations are upserting, not inserting | RevOps |
| Blocking rule review | Quarterly | Confirm blocking rules are active and catching duplicates | RevOps |
| Dedup metric tracking | Monthly | Report on duplicate rate and trend | RevOps |

---

## Measurement

| Metric | Definition | Target | Frequency |
|--------|-----------|--------|-----------|
| Duplicate rate | % of records that are duplicates | < 3% | Monthly |
| New duplicates created | Count of duplicates created since last scan | Decreasing | Monthly |
| Duplicates merged | Count of duplicates resolved | Matches or exceeds new duplicates | Monthly |
| Import dedup rate | % of import rows that matched existing records | Track trend | Per import |
| Blocking rule triggers | Count of duplicate creation attempts blocked | Track (high = prevention working) | Monthly |
| Time to resolve | Average days between duplicate detection and merge | < 7 days | Monthly |

---

## Pre-Dedup Checklist

- [ ] Detection method selected (exact email, domain+name, fuzzy company)
- [ ] Detection tool configured (built-in CRM tool or third-party)
- [ ] Merge rules defined (which record survives, field priority)
- [ ] Exact email duplicates identified and bulk-merged
- [ ] Fuzzy matches reviewed individually by a human
- [ ] Prevention rules configured (blocking rules, form matching, import dedup)
- [ ] API integrations audited for upsert vs insert behavior
- [ ] Monthly dedup scan scheduled with assigned owner
- [ ] Duplicate rate baseline measured
- [ ] Team trained on how duplicates enter and how to avoid them

---

## Anti-Pattern Check

- Deleting duplicates instead of merging. You delete 1,000 duplicate contacts. Their 3,000 activities, 200 form submissions, and 50 deal associations are gone forever. Always merge. The surviving record inherits everything
- Dedup once and calling it done. You clean 5,000 duplicates. Six months later, 2,000 new duplicates exist from form submissions, imports, and integrations. Dedup is monthly, not one-time
- Auto-merging fuzzy matches without review. "John Smith at Acme" and "Jon Smith at Acme" are probably the same person. "John Smith at Acme Software" and "John Smith at Acme Insurance" are not. Fuzzy matches need human review
- No import dedup. Marketing imports a list of 2,000 contacts from an event. 400 already exist in the CRM. Without dedup, you now have 400 duplicate contacts. Always dedup imports before loading
- Integrations creating duplicates. Your enrichment tool creates a new contact instead of updating the existing one. 500 new duplicates per month from one integration. Audit every integration for upsert behavior
- Only deduplicating contacts, not accounts. You merge duplicate contacts but have 200 duplicate accounts. Deals are split across "Acme" and "Acme Inc." Pipeline reporting is fragmented. Dedup accounts too
- No prevention, only cleanup. You clean duplicates monthly but never address why they're created. Forms, imports, and integrations keep producing them. Fix the entry points, not just the symptoms