---
name: icp-modeling-with-data
slug: icp-modeling-with-data
description: This skill should be used when the user asks to "model ICP from data", "use data to define ICP", "analyze closed-won deals for ICP", "build an ICP from CRM data", "use win/loss data for ICP", "score ICP fit from data", "build a predictive ICP model", "analyze customer data for ICP patterns", "quantify ICP from closed-won", or any variation of using closed-won, churn, and CRM data to quantitatively model the ideal customer profile for B2B SaaS.
category: general
---

# ICP Modeling with Data

ICP modeling uses closed-won deal data, churn data, and customer health data to quantitatively identify which company attributes predict success. Instead of guessing "we think mid-market SaaS is our ICP," you analyze 50-100 closed-won deals and find that companies with 80-300 employees, Series A-B funding, and a sales-led motion close 3x faster and churn 60% less. The data tells you the ICP. Your job is to listen.

The principle: the ICP model is only as good as the data behind it. Minimum 50 closed-won deals for a directional model. Minimum 100 for a statistically meaningful one. If you have fewer than 50 wins, use the qualitative icp-definition-framework skill instead.

## The Data You Need

### Required datasets

| Dataset | Source | Minimum size | What it tells you |
|---------|--------|-------------|-------------------|
| Closed-won deals | CRM (Opportunities, Closed Won) | 50+ deals | Which companies actually buy |
| Closed-lost deals | CRM (Opportunities, Closed Lost) | 50+ deals | Which companies evaluate but don't buy. The contrast with won reveals fit |
| Churned customers | CS platform or CRM | 20+ churned accounts | Which companies leave. Anti-ICP signal |
| Active healthy customers | CS platform or CRM | 30+ accounts | Which companies stay and grow. Strongest ICP signal |
| Enrichment data | Apollo, Clearbit, or CRM enrichment fields | All accounts enriched | Firmographic and technographic attributes for analysis |

### Data fields to collect per deal/account

| Field | Source | Type | Why needed |
|-------|--------|------|-----------|
| Company name | CRM | Text | Identification |
| Domain | CRM or enrichment | Text | Dedup and enrichment key |
| Employee count (at time of deal) | Enrichment | Number | Size segmentation |
| Industry | Enrichment | Category | Vertical analysis |
| Funding stage | Crunchbase or enrichment | Category | Stage analysis |
| ARR / revenue range | Enrichment or estimate | Number | Revenue segmentation |
| Geography (HQ) | Enrichment | Category | Regional analysis |
| GTM motion | Manual or inferred from team composition | Category | Motion matching |
| Deal ACV | CRM | Number | Value segmentation |
| Sales cycle (days) | CRM (created date to closed date) | Number | Velocity analysis |
| Deal source | CRM | Category | Channel analysis |
| Champion title | CRM (primary contact) | Text | Persona analysis |
| Tech stack | Job postings or enrichment | List | Stack-based segmentation |
| Outcome | CRM | Won / Lost / Churned / Active | The dependent variable |
| Churn date (if applicable) | CS platform | Date | Retention analysis |
| Expansion revenue (if applicable) | Billing | Number | Growth analysis |
| NPS or health score (if available) | CS platform | Number | Satisfaction correlation |

---

## The Analysis Process

### Step 1: Build the analysis dataset

Merge all data into one spreadsheet or database with one row per deal/account.

```
Columns:
company | domain | employees | industry | funding | geography |
gtm_motion | acv | sales_cycle_days | source | champion_title |
tech_stack | outcome (won/lost/churned/active) | churn_date |
expansion_revenue | health_score
```

**Dataset rules:**
- Every row must have the outcome field populated (won, lost, churned, active). Without the outcome, the row is unusable
- Enrich missing fields before analysis. A dataset with 40% blank industry fields produces weak industry insights. Enrich from Apollo or Clearbit before analyzing
- Separate "active" customers by tenure. A customer active for 6 months is different from one active for 3 years. Add a "months_active" column

### Step 2: Segment and compare

For each attribute, compare win rates, churn rates, and value metrics across segments.

**Analysis template (repeat for each attribute):**

| Employee count | Won deals | Lost deals | Win rate | Avg ACV | Avg cycle (days) | Churn rate | Expansion rate |
|---------------|-----------|-----------|---------|---------|-----------------|-----------|---------------|
| 1-20 | 5 | 15 | 25% | $8K | 45 | 35% | 5% |
| 21-50 | 8 | 12 | 40% | $18K | 38 | 20% | 12% |
| 51-200 | 18 | 10 | 64% | $42K | 28 | 8% | 25% |
| 201-500 | 12 | 8 | 60% | $65K | 35 | 10% | 22% |
| 501-1000 | 5 | 12 | 29% | $85K | 72 | 5% | 30% |
| 1000+ | 2 | 18 | 10% | $120K | 120 | 3% | 35% |

**What this table reveals:** 51-200 employees is the sweet spot. Highest win rate (64%), fastest cycle (28 days), low churn (8%), strong expansion (25%). This is the ICP size band.

### Attributes to segment by

| Attribute | Segments to compare | What to look for |
|-----------|-------------------|-----------------|
| Employee count | 5-6 size bands | Which band has the highest win rate + lowest churn? |
| Industry | Top 5-8 industries in the dataset | Which verticals win and retain at the highest rates? |
| Funding stage | Seed, A, B, C, D+, Public, Bootstrapped | Which stages buy fastest and stay longest? |
| Geography | US regions, international | Where do you win most and support best? |
| GTM motion | Sales-led, PLG, Hybrid | Which motion is the best fit for your product? |
| ACV band | $0-10K, $10-30K, $30-100K, $100K+ | Which deal size has the best economics (win rate x retention x expansion)? |
| Champion title | VP, Director, Manager, IC | Which persona drives the most successful deals? |
| Source | Inbound, Outbound, Referral, Partner | Which channel produces the best customers (not just the most)? |
| Tech stack | Companies using [CRM], [sequencing tool], etc. | Does stack predict fit? Do Salesforce shops buy and retain better than HubSpot shops? |

### Step 3: Score and rank

Assign a fit score based on how many ICP attributes a deal matches.

**Scoring example:**

| Attribute | ICP value (from Step 2) | Score if match |
|-----------|------------------------|---------------|
| Employee count: 51-200 | +20 | |
| Industry: B2B SaaS | +15 | |
| Funding: Series A-B | +15 | |
| Geography: US | +10 | |
| GTM motion: Sales-led | +10 | |
| Champion: VP or Director | +10 | |
| Uses Salesforce or HubSpot | +10 | |
| **Total possible** | **90** | |

Then score every deal in the dataset and compare outcomes:

| ICP score band | Deals | Win rate | Avg ACV | Avg cycle | Churn rate |
|---------------|-------|---------|---------|-----------|-----------|
| 70-90 (strong fit) | 25 | 72% | $55K | 25 days | 5% |
| 50-69 (moderate fit) | 40 | 48% | $35K | 42 days | 15% |
| 30-49 (weak fit) | 30 | 22% | $20K | 65 days | 30% |
| 0-29 (no fit) | 15 | 8% | $12K | 90 days | 45% |

**What this proves:** Strong-fit deals (70-90) win at 9x the rate of no-fit deals, close 3.6x faster, have 4.6x higher ACV, and churn at 1/9 the rate. The ICP model works.

### Step 4: Validate with holdout data

Split your data: use 70% to build the model, 30% to validate.

**Validation process:**
1. Build the scoring model from 70% of the data (the training set)
2. Score the remaining 30% (the holdout set) using the model
3. Compare: do high-fit deals in the holdout set actually win more, close faster, and churn less?
4. If yes: the model is validated. Deploy it
5. If no: the model is overfit to the training data. Simplify (fewer attributes, fewer segments)

**Validation rules:**
- If holdout win rate for strong-fit deals is within 10% of training win rate, the model is robust
- If holdout results are dramatically different (win rates don't correlate with fit score), the model is overfit. Remove the weakest attributes and re-test
- Minimum 15 deals in the holdout set for each fit tier. Below that, the sample is too small to validate

---

## Advanced Modeling

### Weighted attributes

Not all ICP attributes are equally predictive. Weight them by their correlation with the outcome.

**How to weight:**

For each attribute, calculate the win rate difference between the best and worst segments.

| Attribute | Best segment win rate | Worst segment win rate | Difference | Weight (normalized) |
|-----------|---------------------|----------------------|-----------|-------------------|
| Employee count | 64% (51-200) | 10% (1000+) | 54 pp | 30% |
| Industry | 68% (B2B SaaS) | 15% (Healthcare) | 53 pp | 29% |
| Funding stage | 62% (Series A-B) | 20% (Bootstrapped) | 42 pp | 23% |
| Geography | 55% (US) | 30% (APAC) | 25 pp | 14% |
| Tech stack | 50% (Salesforce) | 42% (No CRM) | 8 pp | 4% |

**Weighting rules:**
- Attributes with > 40 pp win-rate difference between best and worst segments are strong predictors. Weight them highest
- Attributes with < 10 pp difference are weak predictors. They don't discriminate between fit and non-fit. Consider removing them from the model
- Normalize weights to sum to 100. This makes the final score interpretable (0-100 scale)

### Negative indicators (anti-ICP)

Some attributes are strong negative predictors. Include them as score penalties.

| Attribute | Anti-ICP value | Penalty |
|-----------|---------------|---------|
| Industry: government, education | These verticals have < 10% win rate and 60% churn | -30 |
| Employee count: < 10 | Too small. Can't afford. Churn at 50% | -20 |
| No CRM in place | Can't integrate. Can't track ROI | -15 |
| Bootstrapped with < $1M ARR | No budget for tools. Extremely price-sensitive | -15 |
| Competitor's employee | Not a real prospect | -100 (disqualify) |

**Anti-ICP rules:**
- A strong negative indicator should be able to disqualify a deal regardless of positive scores. A competitor employee with a 90 fit score is still disqualified
- Anti-ICP attributes should be based on loss AND churn data. An attribute that predicts losses AND churn is a double negative. Weight it heavily

### Multi-outcome modeling

Instead of just modeling win/loss, model multiple outcomes.

| Outcome to model | Dataset | What it reveals |
|-----------------|---------|----------------|
| Win vs loss | Closed-won vs closed-lost | Which companies buy |
| Retained vs churned | Active 12+ months vs churned | Which companies stay |
| Expanded vs flat | Customers with expansion vs no expansion | Which companies grow |
| Fast close vs slow close | Deals < 30 days vs > 90 days | Which companies buy quickly |

**Multi-outcome rules:**
- The strongest ICP attributes are those that predict positive outcomes across ALL models. "51-200 employees" predicting both higher win rate AND lower churn AND higher expansion is a triple signal. That attribute belongs in the ICP with maximum weight
- An attribute that predicts wins but also predicts churn is a trap. "Healthcare companies buy often but churn at 40%" means healthcare is not ICP despite a decent win rate. Look at the full customer lifecycle, not just the sale

---

## Implementing the ICP Score in CRM

### CRM fields

| Field | Type | Object | How it's set |
|-------|------|--------|-------------|
| `icp_score` | Number (0-100) | Account/Company | Automated from enrichment data + scoring model |
| `icp_tier` | Picklist (Tier 1, 2, 3, Not ICP) | Account/Company | Derived from icp_score: 70+ = Tier 1, 50-69 = Tier 2, 30-49 = Tier 3, < 30 = Not ICP |
| `icp_score_details` | Long text | Account/Company | Breakdown: "Size: +20, Industry: +15, Stage: +15, Geo: +10 = 60" |
| `icp_last_scored` | Date | Account/Company | Timestamp of last score calculation |

### Automation

```
Trigger: New account created OR enrichment data updated
  ↓
Action: Calculate ICP score from enrichment fields
  Employee count → size score
  Industry → industry score
  Funding stage → stage score
  Geography → geo score
  Tech stack → stack score
  Anti-ICP checks → penalties
  ↓
Set icp_score = sum of all dimension scores
Set icp_tier based on score thresholds
Set icp_last_scored = today
  ↓
If icp_tier = "Tier 1" or "Tier 2": flag for prioritization
If icp_tier = "Not ICP": flag for review or disqualification
```

### Implementation rules

- **Score automatically on account creation.** The ICP score should populate within seconds of an account entering the CRM, not after a human reviews it
- **Re-score when enrichment data changes.** If a company raises a new round (funding stage changes) or grows (employee count changes), the ICP score should update
- **Make the score visible to reps.** The ICP tier should appear on the account record, in list views, and in the lead routing logic. A score buried in a custom field nobody sees is useless
- **Don't hide the scoring logic.** The `icp_score_details` field shows how the score was calculated. When a rep asks "why is this Tier 2 and not Tier 1?" they can see the breakdown

---

## Model Maintenance

### Quarterly review process

| Step | What to do | Why |
|------|-----------|-----|
| 1. Pull new win/loss data from the last quarter | The model was built on historical data. New data validates or invalidates it | Fresh data catches model drift |
| 2. Re-run the segmentation analysis | Do the same attributes still predict wins? | Market changes. Product evolves. ICP may shift |
| 3. Check churn by ICP tier | Are Tier 1 accounts still retaining best? | If Tier 1 churn is rising, the model is no longer identifying sticky customers |
| 4. Check pipeline by ICP tier | Is > 70% of pipeline from Tier 1-2? | If the team is generating mostly non-ICP pipeline, targeting is misaligned |
| 5. Adjust weights or add/remove attributes | If an attribute stopped being predictive, remove it. If a new attribute emerged, add it | Model accuracy degrades over time without updates |
| 6. Re-validate on holdout data | Test the updated model on the last quarter's data | Confirms the model still works |

### Maintenance rules

- **Quarterly is the minimum review cadence.** Every quarter, re-run the analysis with fresh data. Annual reviews are too infrequent for a fast-moving SaaS company
- **Track model accuracy over time.** "What % of Tier 1 deals closed this quarter?" and "What % of churned customers were Tier 1?" If accuracy is declining, the model needs recalibration
- **The model should get better over time, not stay static.** Each quarter adds more data points. More data = better segmentation = more accurate predictions. The Q4 model should outperform the Q1 model

---

## Tools for ICP Modeling

| Approach | Tool | Best for | Complexity |
|----------|------|----------|-----------|
| Spreadsheet analysis | Google Sheets, Excel | Teams with 50-200 deals. Manual segmentation and scoring | Low |
| BI tool analysis | Looker, Metabase, Mode | Teams with 200+ deals. Visual analysis. Shareable dashboards | Medium |
| CRM-native reporting | HubSpot reports, Salesforce reports | Basic segmentation within CRM. No export needed | Low-medium |
| Statistical modeling (regression) | Python (pandas, scikit-learn), R | Teams with 500+ deals. Predictive modeling. Feature importance | High |
| Predictive scoring vendors | MadKudu, Clearbit Reveal, 6sense | Automated ICP scoring with ML. Minimal manual analysis | Low (to implement), $$ (cost) |

### Tool selection rules

- **Start with a spreadsheet.** Export CRM data. Do the segmentation manually. You'll learn more from hands-on analysis than from any tool's automated output
- **Graduate to BI tools at 200+ deals.** Spreadsheets get unwieldy above 200 rows. BI tools make segmentation visual and shareable
- **Statistical modeling at 500+ deals.** Below 500 deals, regression models overfit. Above 500, logistic regression can identify non-obvious attribute interactions that manual analysis misses
- **Predictive scoring vendors at $10M+ ARR.** Below $10M, the data volume doesn't justify the tool cost. Above $10M, automated ICP scoring saves RevOps 5-10 hours per week

---

## Measurement

| Metric | Definition | Target | Frequency |
|--------|-----------|--------|-----------|
| Model accuracy: win rate by tier | Win rate for Tier 1 vs Tier 2 vs non-ICP | Tier 1 win rate > 2x non-ICP win rate | Quarterly |
| Model accuracy: churn by tier | Churn rate for Tier 1 vs non-ICP | Tier 1 churn < 50% of non-ICP churn | Quarterly |
| Pipeline concentration | % of pipeline from Tier 1-2 accounts | > 70% | Monthly |
| ICP coverage | % of accounts in CRM with an ICP score | > 90% | Monthly |
| Scoring freshness | % of scores updated in last 90 days | > 80% | Monthly |
| Tier 1 expansion rate | Expansion revenue from Tier 1 accounts | > average for all accounts | Quarterly |

---

## Anti-Pattern Check

- Building the model from 15 deals. The sample size is too small. Patterns from 15 deals are likely noise, not signal. Wait until you have 50+ wins and 50+ losses before modeling
- Using only win data (no losses). The model identifies who buys. Without loss data, it can't identify who doesn't buy. The contrast between wins and losses reveals the discriminating attributes. Include both
- Ignoring churn data. A company type that buys easily but churns at 40% is not ICP. It's a trap. Include churn data in the model. The best ICP attributes predict wins AND retention
- Over-weighting one attribute. "100% of our wins are in the US" when 95% of prospects are also in the US. Geography isn't a discriminating factor in this case. Compare the win rate for the attribute vs the base rate
- Building the model once and never updating. The model from 6 months ago was built on different data, a different product, and a different market. Refresh quarterly with new win/loss/churn data
- Using the model to exclude leads instead of prioritize. The ICP model should determine routing priority and sales effort allocation. It should not be a hard gate that prevents non-ICP leads from ever being contacted. Tier 3 accounts can still be worked at lower priority
- No anti-ICP in the model. The model only has positive scores. A government agency with 3 positive attributes scores 45/90 and enters the pipeline. Include negative indicators that disqualify regardless of positive fit
- Scoring without enrichment. The model requires employee count, industry, and funding stage. If 40% of accounts are missing these fields, 40% of scores are wrong. Enrich before scoring