general crm-dedup-strategy

crm-dedup-strategy

This skill should be used when the user asks to "deduplicate CRM records", "remove duplicate contacts", "CRM dedup strategy", "merge duplicate records", "clean up duplicate accounts", "deduplication best practices", "find and merge CRM duplicates", "prevent duplicate records in CRM", "CRM data dedup process", or any variation of deduplicating contacts, companies, or deals in a CRM for B2B SaaS.
Download .md

CRM Dedup Strategy

Deduplication is the process of finding and merging duplicate records in your CRM. Duplicates inflate reporting, split engagement history across records, cause double-outreach, and make pipeline data unreliable. A database with 10% duplicates means 10% of your contacts are getting fragmented experiences.

The principle: deduplication is a three-part problem. Find duplicates (detection), merge them correctly (resolution), and stop new ones from entering (prevention). Most teams only do detection and ignore prevention. That's cleaning the floor while the faucet is running.

The Dedup Framework

Three phases

Phase What it does When to run
Detection Find existing duplicate records Monthly + on-demand
Resolution Merge or purge duplicates correctly After detection
Prevention Stop new duplicates from entering the CRM Always-on automation

Phase 1: Detection

Matching strategies

Match type How it works Accuracy Speed
Exact email match Two records with identical email addresses Very high Fast
Domain + name Same email domain + similar first/last name High Medium
Company name fuzzy "Acme Inc" matches "Acme, Inc." matches "ACME" Medium Slow
Phone number Exact phone match (normalized) High Fast
Name + company Same name at same company Medium Medium

Detection rules

  • Start with exact email match. This catches the most common duplicates with zero false positives. Two contacts with john@acme.com are definitely duplicates
  • Normalize before matching. Lowercase emails, strip phone formatting, standardize company names. "john@ACME.com" and "john@acme.com" are the same email
  • Fuzzy company matching requires human review. "Acme Inc" and "Acme Corporation" are probably the same company. "Acme Software" and "Acme Insurance" are not. Fuzzy matches need human confirmation
  • Match across objects. A Lead with john@acme.com and a Contact with john@acme.com are duplicates that span the lead-contact boundary. Check both objects

Detection tools

Tool CRM How it works
HubSpot built-in dedupe HubSpot Suggests duplicates based on email, name, phone
Salesforce Duplicate Rules Salesforce Configurable matching rules with alerts or blocks
Dedupely Both Third-party tool for bulk dedup
Insycle HubSpot Data management platform with dedup features
Custom script Both API-based matching for complex logic

Phase 2: Resolution

Merge vs delete

Action When to use Risk
Merge Always preferred. Combines data from both records into one Must choose the right surviving record
Delete Only when one record has zero value (test data, spam) Permanent data loss. No undo

Merge rules

  • Always merge, never delete. Deleting a duplicate loses its activity history, form submissions, and deal associations. Merging preserves everything on the surviving record
  • The surviving record has the most data. The record with more activities, more populated fields, and more recent engagement should be the master. The other record's data fills gaps
  • Review conflicts before merging. If record A has title "VP Sales" and record B has title "Director of Sales," decide which is correct before merging. Don't blindly overwrite
  • Merge in bulk carefully. Bulk merging 500 records without review will merge some incorrectly. Review all fuzzy matches individually. Only auto-merge exact email matches

Merge priority matrix

Field Keep from Why
Email Most recently verified Verified email is more likely to be current
Title Most recently updated Titles change. The latest is most accurate
Phone Most recently updated Phone numbers change less often but still degrade
Company Record with deal history The company association linked to pipeline matters more
Lead Source First created record Original source attribution should be preserved
Activity history Both (merged automatically) All activities consolidate on the surviving record
Deal associations Both (merged automatically) All deals consolidate on the surviving record

Phase 3: Prevention

Preventing new duplicates

Prevention method How it works CRM support
Duplicate blocking rules Block creation of a record that matches an existing one Salesforce (native), HubSpot (limited)
Form dedup When a form is submitted, match against existing records instead of creating new HubSpot (native on email match), Salesforce (Web-to-Lead matching)
Import dedup Deduplicate import files against existing records Both (must be configured)
Integration dedup Ensure API integrations check for existing records before creating new Custom development
Lead-to-contact matching Convert leads that match existing contacts instead of creating duplicates Salesforce (lead conversion), HubSpot (auto-association)

Prevention rules

  • Match on email before creating. Every new record entry point (form, import, API, manual) should check if the email already exists. If it does, update the existing record. Don't create a new one
  • Blocking rules for manual creation. In Salesforce, set up Duplicate Rules that warn or block when a user creates a record matching an existing one. In HubSpot, enable duplicate detection in settings
  • Import dedup is mandatory. Never import a CSV without deduplicating against the existing database first. Every import is a potential source of 5-15% new duplicates
  • API integrations must upsert, not insert. When your enrichment tool or sequencing tool syncs data, it should update existing records (upsert), not create new ones (insert). Configure the integration correctly

Dedup by Object

Contact dedup

Priority 1: Exact email match
  → Auto-merge (safe, no false positives)

Priority 2: Same email domain + same first name + same last name
  → High confidence. Auto-merge after spot-check

Priority 3: Same email domain + similar name (Levenshtein ≤ 2)
  → Medium confidence. Human review required

Priority 4: Same company + same name (no email match)
  → Lower confidence. May be different people. Human review

Company/Account dedup

Priority 1: Exact domain match
  → acme.com and acme.com are definitely the same company

Priority 2: Domain variations
  → acme.com and acme.io may be the same company. Verify

Priority 3: Fuzzy company name
  → "Acme Inc" and "Acme Corporation." Probably the same. Verify
  → "Acme Software" and "Acme Insurance." Different companies

Priority 4: Subsidiaries
  → "Acme EMEA" and "Acme Corp." Related but may need
     separate records. Business decision, not a dedup decision

Ongoing Cadence

Task Frequency Method Owner
Exact email dedup scan Monthly Automated (CRM tool or script) RevOps
Fuzzy match review Monthly Tool generates candidates, human reviews RevOps
Import dedup check Every import Pre-import matching Whoever imports
Integration audit Quarterly Verify integrations are upserting, not inserting RevOps
Blocking rule review Quarterly Confirm blocking rules are active and catching duplicates RevOps
Dedup metric tracking Monthly Report on duplicate rate and trend RevOps

Measurement

Metric Definition Target Frequency
Duplicate rate % of records that are duplicates < 3% Monthly
New duplicates created Count of duplicates created since last scan Decreasing Monthly
Duplicates merged Count of duplicates resolved Matches or exceeds new duplicates Monthly
Import dedup rate % of import rows that matched existing records Track trend Per import
Blocking rule triggers Count of duplicate creation attempts blocked Track (high = prevention working) Monthly
Time to resolve Average days between duplicate detection and merge < 7 days Monthly

Pre-Dedup Checklist

  • [ ] Detection method selected (exact email, domain+name, fuzzy company)
  • [ ] Detection tool configured (built-in CRM tool or third-party)
  • [ ] Merge rules defined (which record survives, field priority)
  • [ ] Exact email duplicates identified and bulk-merged
  • [ ] Fuzzy matches reviewed individually by a human
  • [ ] Prevention rules configured (blocking rules, form matching, import dedup)
  • [ ] API integrations audited for upsert vs insert behavior
  • [ ] Monthly dedup scan scheduled with assigned owner
  • [ ] Duplicate rate baseline measured
  • [ ] Team trained on how duplicates enter and how to avoid them

Anti-Pattern Check

  • Deleting duplicates instead of merging. You delete 1,000 duplicate contacts. Their 3,000 activities, 200 form submissions, and 50 deal associations are gone forever. Always merge. The surviving record inherits everything
  • Dedup once and calling it done. You clean 5,000 duplicates. Six months later, 2,000 new duplicates exist from form submissions, imports, and integrations. Dedup is monthly, not one-time
  • Auto-merging fuzzy matches without review. "John Smith at Acme" and "Jon Smith at Acme" are probably the same person. "John Smith at Acme Software" and "John Smith at Acme Insurance" are not. Fuzzy matches need human review
  • No import dedup. Marketing imports a list of 2,000 contacts from an event. 400 already exist in the CRM. Without dedup, you now have 400 duplicate contacts. Always dedup imports before loading
  • Integrations creating duplicates. Your enrichment tool creates a new contact instead of updating the existing one. 500 new duplicates per month from one integration. Audit every integration for upsert behavior
  • Only deduplicating contacts, not accounts. You merge duplicate contacts but have 200 duplicate accounts. Deals are split across "Acme" and "Acme Inc." Pipeline reporting is fragmented. Dedup accounts too
  • No prevention, only cleanup. You clean duplicates monthly but never address why they're created. Forms, imports, and integrations keep producing them. Fix the entry points, not just the symptoms
Want agents that use skill files like this?
We customize skill files for your brand voice and methodology, then run content agents against them.
Book a call