general hubspot-data-hygiene

hubspot-data-hygiene

This skill should be used when the user asks to "clean up HubSpot data", "fix HubSpot data quality", "deduplicate HubSpot contacts", "HubSpot data hygiene", "clean HubSpot CRM", "improve HubSpot data quality", "audit HubSpot data", "fix bad data in HubSpot", "HubSpot data cleanup process", or any variation of cleaning, maintaining, and improving data quality in HubSpot CRM for B2B SaaS.
Download .md

HubSpot Data Hygiene

Data hygiene is the ongoing process of keeping your HubSpot database clean, accurate, and useful. Duplicate contacts, stale deals, missing fields, and incorrect properties degrade every system that depends on CRM data: scoring, routing, reporting, outreach, and forecasting. Clean data is the foundation of every GTM operation.

The principle: data hygiene is not a one-time project. It's a recurring process. Data degrades at 2-3% per month through job changes, company merges, email bounces, and human error. A database that's clean today is 25-35% degraded in a year without ongoing maintenance.

The Data Hygiene Framework

Four dimensions of data quality

Dimension What it means How to measure Example issue
Accuracy Data is correct and current Bounce rate, manual spot-check Email address is wrong, title is outdated
Completeness Required fields are populated Fill rate per property Company size is blank on 40% of contacts
Consistency Data follows standards Dropdown compliance rate "SaaS", "SAAS", "Software as a Service" for the same value
Uniqueness No duplicate records Duplicate detection scan Same person exists 3 times with slight name variations

The Hygiene Audit

Step 1: Assess current state

Check How to run What it tells you
Duplicate contacts HubSpot built-in dedupe tool or export and match on email How many records are inflated by dupes
Duplicate companies Match on domain name Company-level deduplication needs
Bounce rate Export contacts, check email verification status % of emails that will bounce (dead addresses)
Property fill rates Report on % populated for each key property Which fields are underused or empty
Stale contacts Filter by "Last activity date > 12 months ago" Dead records clogging the database
Stale deals Filter by "Last activity > 30 days" on open deals Pipeline inflation from zombie deals
Invalid data Filter dropdowns for "Other" or blank Where structured data is being bypassed

Step 2: Quantify the problem

Database health scorecard:
  Total contacts: 50,000
  Duplicates: 4,200 (8.4%)
  Invalid emails (bounced/unverifiable): 6,500 (13%)
  Missing company association: 8,000 (16%)
  Missing job title: 12,000 (24%)
  No activity in 12+ months: 18,000 (36%)
  
  Health score: 100 - (8.4 + 13 + 16 + 24 + 36)/5 = 80.5%
  Target: > 90%

Deduplication

Finding duplicates

Method Matches on Accuracy HubSpot support
Email match Exact email address High Built-in dedupe tool
Domain + name Same company domain + similar name Medium-high Manual or third-party
Phone number Exact phone match Medium Manual filter
Company name fuzzy Similar company names (Acme Inc vs Acme) Medium Third-party tools

Deduplication rules

  • Merge, don't delete. When you find duplicates, merge them. Deleting loses activity history, form submissions, and deal associations. Merging preserves everything on the surviving record
  • Keep the record with the most data. When merging, the record with more populated properties, more activities, and more recent engagement should be the surviving record
  • Automate ongoing dedupe. Run HubSpot's dedupe tool monthly. Set up a workflow to flag new contacts that match an existing email or company+name pattern
  • Fix the source. Duplicates enter through forms without email matching, imports without dedup checks, and integrations without proper matching. Fix the entry point, not just the symptom

Email Hygiene

Email validation process

1. Export all contacts with email addresses
2. Run through an email verification tool
   (NeverBounce, ZeroBounce, or equivalent)
3. Results:
   - Valid: keep as-is
   - Invalid: mark as "Email Invalid" in HubSpot
   - Risky: re-verify in 30 days
   - Catch-all: keep but monitor bounce rate
4. Suppress invalid emails from all sending
5. Re-verify the full database every 6 months

Email hygiene rules

  • Verify before sending. Never send a cold outbound campaign without verifying the list first. A 10%+ bounce rate damages your sending domain reputation
  • Remove hard bounces immediately. When HubSpot logs a hard bounce, mark the email as invalid and suppress from all sequences. Don't wait for the monthly cleanup
  • Re-verify every 6 months. Email addresses go stale. People change jobs, companies change domains. A valid email 6 months ago may bounce today
  • Track bounce rate as a hygiene metric. Bounce rate > 5% on any campaign means your email data needs attention. Target < 2%

Property Cleanup

Common property issues

Issue Impact Fix
"Other" overuse in dropdowns Reporting is meaningless when 30% is "Other" Review "Other" entries. Add missing options. Reclassify where possible
Free-text fields for structured data "Industry" has 47 spellings of "SaaS" Convert to dropdown. Map existing values to dropdown options
Outdated dropdown options Products, personas, or segments that no longer exist Remove obsolete options. Reclassify existing records
Unused properties 150 custom properties, 50 have < 5% fill rate Archive unused properties. They clutter forms and views
Conflicting data "ICP Fit: Yes" but company size = 3 (below ICP minimum) Build validation workflows that flag contradictions

Property cleanup process

Quarterly property audit:
1. Export all custom properties with fill rates
2. Flag properties with < 10% fill rate
3. For each flagged property:
   - Is it used in any report? → Keep if yes
   - Is it used in any workflow? → Keep if yes
   - Is it required on any form? → Keep if yes
   - None of the above → Archive
4. Review "Other" fill rates on dropdown properties
   - If > 15% are "Other", add missing options
5. Check for duplicate-purpose properties
   - Merge or standardize

Stale Record Management

Contact staleness

Status Definition Action
Active Activity in last 90 days No action needed
Cooling Activity 90-180 days ago Flag for re-engagement campaign
Cold Activity 180-365 days ago Move to cold nurture. Suppress from primary campaigns
Dead No activity in 12+ months Suppress from all sending. Consider archiving

Deal staleness

Status Definition Action
Active Activity in last 14 days No action needed
Stalling No activity 14-30 days Alert deal owner. Require next step update
Stale No activity 30-60 days Move to "At Risk." Manager intervention
Zombie No activity 60+ days Close as Lost (reason: "No Decision/Stale"). Remove from pipeline

Staleness rules

  • Auto-alert on stale deals. Workflow: if deal has no activity for 14 days and is in an active stage, notify the deal owner. Don't wait for the monthly pipeline review
  • Auto-close zombie deals. If a deal has no activity for 60 days and the owner doesn't update after 2 alerts, auto-close as Lost with reason "Stale - No Activity." This keeps pipeline real
  • Don't delete stale contacts. Suppress them from sending. Mark them as "Cold" or "Inactive." They may re-engage. Deleting loses all history

Ongoing Hygiene Cadence

Task Frequency Owner Time required
Deduplicate contacts Monthly RevOps 1-2 hours
Review hard bounces Weekly Marketing Ops 30 minutes
Stale deal cleanup Bi-weekly Sales Ops 1 hour
Property fill rate audit Quarterly RevOps 2-3 hours
Full email verification Every 6 months Marketing Ops 2-4 hours (plus tool cost)
Property cleanup and archiving Quarterly RevOps 2-3 hours
"Other" dropdown review Quarterly RevOps 1 hour
Database health scorecard Monthly RevOps 1 hour

Measurement

Metric Definition Target Frequency
Duplicate rate % of contacts that are duplicates < 3% Monthly
Email validity rate % of emails verified as valid > 92% Quarterly
Key property fill rate % populated for top 10 properties > 85% Monthly
Bounce rate on sends Hard bounce rate on email campaigns < 2% Per campaign
Stale deal count Open deals with no activity in 30+ days Decreasing Bi-weekly
Database health score Composite of accuracy, completeness, consistency, uniqueness > 90% Monthly
"Other" rate on dropdowns % of records using "Other" on key dropdowns < 10% Quarterly

Pre-Cleanup Checklist

  • [ ] Database health scorecard calculated (current state assessment)
  • [ ] Duplicate detection run on contacts and companies
  • [ ] Email verification run on the full database
  • [ ] Property fill rates exported and reviewed
  • [ ] Stale deals identified (no activity 30+ days)
  • [ ] Stale contacts identified (no activity 12+ months)
  • [ ] "Other" dropdown values reviewed for top 10 dropdown properties
  • [ ] Unused properties identified (< 5% fill rate, no report/workflow usage)
  • [ ] Hygiene cadence assigned (who does what, how often)
  • [ ] Automated workflows configured (duplicate alerts, stale deal alerts, bounce suppression)

Anti-Pattern Check

  • One-time cleanup, no ongoing process. You spend 40 hours cleaning the database. It's perfect. 6 months later, it's 20% degraded. Data hygiene is recurring. Assign a monthly cadence and an owner
  • Deleting duplicates instead of merging. You find 3,000 duplicates and delete 1,500 contacts. You lost all their activity history, form submissions, and deal associations. Always merge. Never delete
  • Ignoring bounced emails. Hard bounces pile up. Sending reputation degrades. Deliverability drops from 95% to 80%. All because nobody suppressed the bounced addresses. Remove hard bounces immediately
  • No validation on data entry. Forms accept any text in any field. Integrations push unvalidated data. Imports have no dedup check. Clean data starts at the point of entry. Validate before it hits the CRM
  • 200 custom properties, half unused. Every initiative added properties. Nobody archived them. Forms have 30 optional fields. Contact records are unreadable. Audit quarterly. Archive anything unused
  • Stale deals inflating pipeline. 40% of open deals have no activity in 45 days. Pipeline report says $3M. Real pipeline is $1.8M. Clean stale deals bi-weekly. Auto-close after 60 days of inactivity
  • Data hygiene is "someone else's job." Marketing blames sales for bad data. Sales blames marketing. Nobody cleans anything. Assign a specific owner (usually RevOps) with a specific cadence and specific metrics
Want agents that use skill files like this?
We customize skill files for your brand voice and methodology, then run content agents against them.
Book a call