HubSpot Data Hygiene
Data hygiene is the ongoing process of keeping your HubSpot database clean, accurate, and useful. Duplicate contacts, stale deals, missing fields, and incorrect properties degrade every system that depends on CRM data: scoring, routing, reporting, outreach, and forecasting. Clean data is the foundation of every GTM operation.
The principle: data hygiene is not a one-time project. It's a recurring process. Data degrades at 2-3% per month through job changes, company merges, email bounces, and human error. A database that's clean today is 25-35% degraded in a year without ongoing maintenance.
The Data Hygiene Framework
Four dimensions of data quality
| Dimension |
What it means |
How to measure |
Example issue |
| Accuracy |
Data is correct and current |
Bounce rate, manual spot-check |
Email address is wrong, title is outdated |
| Completeness |
Required fields are populated |
Fill rate per property |
Company size is blank on 40% of contacts |
| Consistency |
Data follows standards |
Dropdown compliance rate |
"SaaS", "SAAS", "Software as a Service" for the same value |
| Uniqueness |
No duplicate records |
Duplicate detection scan |
Same person exists 3 times with slight name variations |
The Hygiene Audit
Step 1: Assess current state
| Check |
How to run |
What it tells you |
| Duplicate contacts |
HubSpot built-in dedupe tool or export and match on email |
How many records are inflated by dupes |
| Duplicate companies |
Match on domain name |
Company-level deduplication needs |
| Bounce rate |
Export contacts, check email verification status |
% of emails that will bounce (dead addresses) |
| Property fill rates |
Report on % populated for each key property |
Which fields are underused or empty |
| Stale contacts |
Filter by "Last activity date > 12 months ago" |
Dead records clogging the database |
| Stale deals |
Filter by "Last activity > 30 days" on open deals |
Pipeline inflation from zombie deals |
| Invalid data |
Filter dropdowns for "Other" or blank |
Where structured data is being bypassed |
Step 2: Quantify the problem
Database health scorecard:
Total contacts: 50,000
Duplicates: 4,200 (8.4%)
Invalid emails (bounced/unverifiable): 6,500 (13%)
Missing company association: 8,000 (16%)
Missing job title: 12,000 (24%)
No activity in 12+ months: 18,000 (36%)
Health score: 100 - (8.4 + 13 + 16 + 24 + 36)/5 = 80.5%
Target: > 90%
Deduplication
Finding duplicates
| Method |
Matches on |
Accuracy |
HubSpot support |
| Email match |
Exact email address |
High |
Built-in dedupe tool |
| Domain + name |
Same company domain + similar name |
Medium-high |
Manual or third-party |
| Phone number |
Exact phone match |
Medium |
Manual filter |
| Company name fuzzy |
Similar company names (Acme Inc vs Acme) |
Medium |
Third-party tools |
Deduplication rules
- Merge, don't delete. When you find duplicates, merge them. Deleting loses activity history, form submissions, and deal associations. Merging preserves everything on the surviving record
- Keep the record with the most data. When merging, the record with more populated properties, more activities, and more recent engagement should be the surviving record
- Automate ongoing dedupe. Run HubSpot's dedupe tool monthly. Set up a workflow to flag new contacts that match an existing email or company+name pattern
- Fix the source. Duplicates enter through forms without email matching, imports without dedup checks, and integrations without proper matching. Fix the entry point, not just the symptom
Email Hygiene
Email validation process
1. Export all contacts with email addresses
2. Run through an email verification tool
(NeverBounce, ZeroBounce, or equivalent)
3. Results:
- Valid: keep as-is
- Invalid: mark as "Email Invalid" in HubSpot
- Risky: re-verify in 30 days
- Catch-all: keep but monitor bounce rate
4. Suppress invalid emails from all sending
5. Re-verify the full database every 6 months
Email hygiene rules
- Verify before sending. Never send a cold outbound campaign without verifying the list first. A 10%+ bounce rate damages your sending domain reputation
- Remove hard bounces immediately. When HubSpot logs a hard bounce, mark the email as invalid and suppress from all sequences. Don't wait for the monthly cleanup
- Re-verify every 6 months. Email addresses go stale. People change jobs, companies change domains. A valid email 6 months ago may bounce today
- Track bounce rate as a hygiene metric. Bounce rate > 5% on any campaign means your email data needs attention. Target < 2%
Property Cleanup
Common property issues
| Issue |
Impact |
Fix |
| "Other" overuse in dropdowns |
Reporting is meaningless when 30% is "Other" |
Review "Other" entries. Add missing options. Reclassify where possible |
| Free-text fields for structured data |
"Industry" has 47 spellings of "SaaS" |
Convert to dropdown. Map existing values to dropdown options |
| Outdated dropdown options |
Products, personas, or segments that no longer exist |
Remove obsolete options. Reclassify existing records |
| Unused properties |
150 custom properties, 50 have < 5% fill rate |
Archive unused properties. They clutter forms and views |
| Conflicting data |
"ICP Fit: Yes" but company size = 3 (below ICP minimum) |
Build validation workflows that flag contradictions |
Property cleanup process
Quarterly property audit:
1. Export all custom properties with fill rates
2. Flag properties with < 10% fill rate
3. For each flagged property:
- Is it used in any report? → Keep if yes
- Is it used in any workflow? → Keep if yes
- Is it required on any form? → Keep if yes
- None of the above → Archive
4. Review "Other" fill rates on dropdown properties
- If > 15% are "Other", add missing options
5. Check for duplicate-purpose properties
- Merge or standardize
Stale Record Management
Contact staleness
| Status |
Definition |
Action |
| Active |
Activity in last 90 days |
No action needed |
| Cooling |
Activity 90-180 days ago |
Flag for re-engagement campaign |
| Cold |
Activity 180-365 days ago |
Move to cold nurture. Suppress from primary campaigns |
| Dead |
No activity in 12+ months |
Suppress from all sending. Consider archiving |
Deal staleness
| Status |
Definition |
Action |
| Active |
Activity in last 14 days |
No action needed |
| Stalling |
No activity 14-30 days |
Alert deal owner. Require next step update |
| Stale |
No activity 30-60 days |
Move to "At Risk." Manager intervention |
| Zombie |
No activity 60+ days |
Close as Lost (reason: "No Decision/Stale"). Remove from pipeline |
Staleness rules
- Auto-alert on stale deals. Workflow: if deal has no activity for 14 days and is in an active stage, notify the deal owner. Don't wait for the monthly pipeline review
- Auto-close zombie deals. If a deal has no activity for 60 days and the owner doesn't update after 2 alerts, auto-close as Lost with reason "Stale - No Activity." This keeps pipeline real
- Don't delete stale contacts. Suppress them from sending. Mark them as "Cold" or "Inactive." They may re-engage. Deleting loses all history
Ongoing Hygiene Cadence
| Task |
Frequency |
Owner |
Time required |
| Deduplicate contacts |
Monthly |
RevOps |
1-2 hours |
| Review hard bounces |
Weekly |
Marketing Ops |
30 minutes |
| Stale deal cleanup |
Bi-weekly |
Sales Ops |
1 hour |
| Property fill rate audit |
Quarterly |
RevOps |
2-3 hours |
| Full email verification |
Every 6 months |
Marketing Ops |
2-4 hours (plus tool cost) |
| Property cleanup and archiving |
Quarterly |
RevOps |
2-3 hours |
| "Other" dropdown review |
Quarterly |
RevOps |
1 hour |
| Database health scorecard |
Monthly |
RevOps |
1 hour |
Measurement
| Metric |
Definition |
Target |
Frequency |
| Duplicate rate |
% of contacts that are duplicates |
< 3% |
Monthly |
| Email validity rate |
% of emails verified as valid |
> 92% |
Quarterly |
| Key property fill rate |
% populated for top 10 properties |
> 85% |
Monthly |
| Bounce rate on sends |
Hard bounce rate on email campaigns |
< 2% |
Per campaign |
| Stale deal count |
Open deals with no activity in 30+ days |
Decreasing |
Bi-weekly |
| Database health score |
Composite of accuracy, completeness, consistency, uniqueness |
> 90% |
Monthly |
| "Other" rate on dropdowns |
% of records using "Other" on key dropdowns |
< 10% |
Quarterly |
Pre-Cleanup Checklist
- [ ] Database health scorecard calculated (current state assessment)
- [ ] Duplicate detection run on contacts and companies
- [ ] Email verification run on the full database
- [ ] Property fill rates exported and reviewed
- [ ] Stale deals identified (no activity 30+ days)
- [ ] Stale contacts identified (no activity 12+ months)
- [ ] "Other" dropdown values reviewed for top 10 dropdown properties
- [ ] Unused properties identified (< 5% fill rate, no report/workflow usage)
- [ ] Hygiene cadence assigned (who does what, how often)
- [ ] Automated workflows configured (duplicate alerts, stale deal alerts, bounce suppression)
Anti-Pattern Check
- One-time cleanup, no ongoing process. You spend 40 hours cleaning the database. It's perfect. 6 months later, it's 20% degraded. Data hygiene is recurring. Assign a monthly cadence and an owner
- Deleting duplicates instead of merging. You find 3,000 duplicates and delete 1,500 contacts. You lost all their activity history, form submissions, and deal associations. Always merge. Never delete
- Ignoring bounced emails. Hard bounces pile up. Sending reputation degrades. Deliverability drops from 95% to 80%. All because nobody suppressed the bounced addresses. Remove hard bounces immediately
- No validation on data entry. Forms accept any text in any field. Integrations push unvalidated data. Imports have no dedup check. Clean data starts at the point of entry. Validate before it hits the CRM
- 200 custom properties, half unused. Every initiative added properties. Nobody archived them. Forms have 30 optional fields. Contact records are unreadable. Audit quarterly. Archive anything unused
- Stale deals inflating pipeline. 40% of open deals have no activity in 45 days. Pipeline report says $3M. Real pipeline is $1.8M. Clean stale deals bi-weekly. Auto-close after 60 days of inactivity
- Data hygiene is "someone else's job." Marketing blames sales for bad data. Sales blames marketing. Nobody cleans anything. Assign a specific owner (usually RevOps) with a specific cadence and specific metrics