general icp-modeling-with-data

icp-modeling-with-data

This skill should be used when the user asks to "model ICP from data", "use data to define ICP", "analyze closed-won deals for ICP", "build an ICP from CRM data", "use win/loss data for ICP", "score ICP fit from data", "build a predictive ICP model", "analyze customer data for ICP patterns", "quantify ICP from closed-won", or any variation of using closed-won, churn, and CRM data to quantitatively model the ideal customer profile for B2B SaaS.
Download .md

ICP Modeling with Data

ICP modeling uses closed-won deal data, churn data, and customer health data to quantitatively identify which company attributes predict success. Instead of guessing "we think mid-market SaaS is our ICP," you analyze 50-100 closed-won deals and find that companies with 80-300 employees, Series A-B funding, and a sales-led motion close 3x faster and churn 60% less. The data tells you the ICP. Your job is to listen.

The principle: the ICP model is only as good as the data behind it. Minimum 50 closed-won deals for a directional model. Minimum 100 for a statistically meaningful one. If you have fewer than 50 wins, use the qualitative icp-definition-framework skill instead.

The Data You Need

Required datasets

Dataset Source Minimum size What it tells you
Closed-won deals CRM (Opportunities, Closed Won) 50+ deals Which companies actually buy
Closed-lost deals CRM (Opportunities, Closed Lost) 50+ deals Which companies evaluate but don't buy. The contrast with won reveals fit
Churned customers CS platform or CRM 20+ churned accounts Which companies leave. Anti-ICP signal
Active healthy customers CS platform or CRM 30+ accounts Which companies stay and grow. Strongest ICP signal
Enrichment data Apollo, Clearbit, or CRM enrichment fields All accounts enriched Firmographic and technographic attributes for analysis

Data fields to collect per deal/account

Field Source Type Why needed
Company name CRM Text Identification
Domain CRM or enrichment Text Dedup and enrichment key
Employee count (at time of deal) Enrichment Number Size segmentation
Industry Enrichment Category Vertical analysis
Funding stage Crunchbase or enrichment Category Stage analysis
ARR / revenue range Enrichment or estimate Number Revenue segmentation
Geography (HQ) Enrichment Category Regional analysis
GTM motion Manual or inferred from team composition Category Motion matching
Deal ACV CRM Number Value segmentation
Sales cycle (days) CRM (created date to closed date) Number Velocity analysis
Deal source CRM Category Channel analysis
Champion title CRM (primary contact) Text Persona analysis
Tech stack Job postings or enrichment List Stack-based segmentation
Outcome CRM Won / Lost / Churned / Active The dependent variable
Churn date (if applicable) CS platform Date Retention analysis
Expansion revenue (if applicable) Billing Number Growth analysis
NPS or health score (if available) CS platform Number Satisfaction correlation

The Analysis Process

Step 1: Build the analysis dataset

Merge all data into one spreadsheet or database with one row per deal/account.

Columns:
company | domain | employees | industry | funding | geography |
gtm_motion | acv | sales_cycle_days | source | champion_title |
tech_stack | outcome (won/lost/churned/active) | churn_date |
expansion_revenue | health_score

Dataset rules:

  • Every row must have the outcome field populated (won, lost, churned, active). Without the outcome, the row is unusable
  • Enrich missing fields before analysis. A dataset with 40% blank industry fields produces weak industry insights. Enrich from Apollo or Clearbit before analyzing
  • Separate "active" customers by tenure. A customer active for 6 months is different from one active for 3 years. Add a "months_active" column

Step 2: Segment and compare

For each attribute, compare win rates, churn rates, and value metrics across segments.

Analysis template (repeat for each attribute):

Employee count Won deals Lost deals Win rate Avg ACV Avg cycle (days) Churn rate Expansion rate
1-20 5 15 25% $8K 45 35% 5%
21-50 8 12 40% $18K 38 20% 12%
51-200 18 10 64% $42K 28 8% 25%
201-500 12 8 60% $65K 35 10% 22%
501-1000 5 12 29% $85K 72 5% 30%
1000+ 2 18 10% $120K 120 3% 35%

What this table reveals: 51-200 employees is the sweet spot. Highest win rate (64%), fastest cycle (28 days), low churn (8%), strong expansion (25%). This is the ICP size band.

Attributes to segment by

Attribute Segments to compare What to look for
Employee count 5-6 size bands Which band has the highest win rate + lowest churn?
Industry Top 5-8 industries in the dataset Which verticals win and retain at the highest rates?
Funding stage Seed, A, B, C, D+, Public, Bootstrapped Which stages buy fastest and stay longest?
Geography US regions, international Where do you win most and support best?
GTM motion Sales-led, PLG, Hybrid Which motion is the best fit for your product?
ACV band $0-10K, $10-30K, $30-100K, $100K+ Which deal size has the best economics (win rate x retention x expansion)?
Champion title VP, Director, Manager, IC Which persona drives the most successful deals?
Source Inbound, Outbound, Referral, Partner Which channel produces the best customers (not just the most)?
Tech stack Companies using [CRM], [sequencing tool], etc. Does stack predict fit? Do Salesforce shops buy and retain better than HubSpot shops?

Step 3: Score and rank

Assign a fit score based on how many ICP attributes a deal matches.

Scoring example:

Attribute ICP value (from Step 2) Score if match
Employee count: 51-200 +20
Industry: B2B SaaS +15
Funding: Series A-B +15
Geography: US +10
GTM motion: Sales-led +10
Champion: VP or Director +10
Uses Salesforce or HubSpot +10
Total possible 90

Then score every deal in the dataset and compare outcomes:

ICP score band Deals Win rate Avg ACV Avg cycle Churn rate
70-90 (strong fit) 25 72% $55K 25 days 5%
50-69 (moderate fit) 40 48% $35K 42 days 15%
30-49 (weak fit) 30 22% $20K 65 days 30%
0-29 (no fit) 15 8% $12K 90 days 45%

What this proves: Strong-fit deals (70-90) win at 9x the rate of no-fit deals, close 3.6x faster, have 4.6x higher ACV, and churn at 1/9 the rate. The ICP model works.

Step 4: Validate with holdout data

Split your data: use 70% to build the model, 30% to validate.

Validation process:

  1. Build the scoring model from 70% of the data (the training set)
  2. Score the remaining 30% (the holdout set) using the model
  3. Compare: do high-fit deals in the holdout set actually win more, close faster, and churn less?
  4. If yes: the model is validated. Deploy it
  5. If no: the model is overfit to the training data. Simplify (fewer attributes, fewer segments)

Validation rules:

  • If holdout win rate for strong-fit deals is within 10% of training win rate, the model is robust
  • If holdout results are dramatically different (win rates don't correlate with fit score), the model is overfit. Remove the weakest attributes and re-test
  • Minimum 15 deals in the holdout set for each fit tier. Below that, the sample is too small to validate

Advanced Modeling

Weighted attributes

Not all ICP attributes are equally predictive. Weight them by their correlation with the outcome.

How to weight:

For each attribute, calculate the win rate difference between the best and worst segments.

Attribute Best segment win rate Worst segment win rate Difference Weight (normalized)
Employee count 64% (51-200) 10% (1000+) 54 pp 30%
Industry 68% (B2B SaaS) 15% (Healthcare) 53 pp 29%
Funding stage 62% (Series A-B) 20% (Bootstrapped) 42 pp 23%
Geography 55% (US) 30% (APAC) 25 pp 14%
Tech stack 50% (Salesforce) 42% (No CRM) 8 pp 4%

Weighting rules:

  • Attributes with > 40 pp win-rate difference between best and worst segments are strong predictors. Weight them highest
  • Attributes with < 10 pp difference are weak predictors. They don't discriminate between fit and non-fit. Consider removing them from the model
  • Normalize weights to sum to 100. This makes the final score interpretable (0-100 scale)

Negative indicators (anti-ICP)

Some attributes are strong negative predictors. Include them as score penalties.

Attribute Anti-ICP value Penalty
Industry: government, education These verticals have < 10% win rate and 60% churn -30
Employee count: < 10 Too small. Can't afford. Churn at 50% -20
No CRM in place Can't integrate. Can't track ROI -15
Bootstrapped with < $1M ARR No budget for tools. Extremely price-sensitive -15
Competitor's employee Not a real prospect -100 (disqualify)

Anti-ICP rules:

  • A strong negative indicator should be able to disqualify a deal regardless of positive scores. A competitor employee with a 90 fit score is still disqualified
  • Anti-ICP attributes should be based on loss AND churn data. An attribute that predicts losses AND churn is a double negative. Weight it heavily

Multi-outcome modeling

Instead of just modeling win/loss, model multiple outcomes.

Outcome to model Dataset What it reveals
Win vs loss Closed-won vs closed-lost Which companies buy
Retained vs churned Active 12+ months vs churned Which companies stay
Expanded vs flat Customers with expansion vs no expansion Which companies grow
Fast close vs slow close Deals < 30 days vs > 90 days Which companies buy quickly

Multi-outcome rules:

  • The strongest ICP attributes are those that predict positive outcomes across ALL models. "51-200 employees" predicting both higher win rate AND lower churn AND higher expansion is a triple signal. That attribute belongs in the ICP with maximum weight
  • An attribute that predicts wins but also predicts churn is a trap. "Healthcare companies buy often but churn at 40%" means healthcare is not ICP despite a decent win rate. Look at the full customer lifecycle, not just the sale

Implementing the ICP Score in CRM

CRM fields

Field Type Object How it's set
icp_score Number (0-100) Account/Company Automated from enrichment data + scoring model
icp_tier Picklist (Tier 1, 2, 3, Not ICP) Account/Company Derived from icp_score: 70+ = Tier 1, 50-69 = Tier 2, 30-49 = Tier 3, < 30 = Not ICP
icp_score_details Long text Account/Company Breakdown: "Size: +20, Industry: +15, Stage: +15, Geo: +10 = 60"
icp_last_scored Date Account/Company Timestamp of last score calculation

Automation

Trigger: New account created OR enrichment data updated
  ↓
Action: Calculate ICP score from enrichment fields
  Employee count → size score
  Industry → industry score
  Funding stage → stage score
  Geography → geo score
  Tech stack → stack score
  Anti-ICP checks → penalties
  ↓
Set icp_score = sum of all dimension scores
Set icp_tier based on score thresholds
Set icp_last_scored = today
  ↓
If icp_tier = "Tier 1" or "Tier 2": flag for prioritization
If icp_tier = "Not ICP": flag for review or disqualification

Implementation rules

  • Score automatically on account creation. The ICP score should populate within seconds of an account entering the CRM, not after a human reviews it
  • Re-score when enrichment data changes. If a company raises a new round (funding stage changes) or grows (employee count changes), the ICP score should update
  • Make the score visible to reps. The ICP tier should appear on the account record, in list views, and in the lead routing logic. A score buried in a custom field nobody sees is useless
  • Don't hide the scoring logic. The icp_score_details field shows how the score was calculated. When a rep asks "why is this Tier 2 and not Tier 1?" they can see the breakdown

Model Maintenance

Quarterly review process

Step What to do Why
1. Pull new win/loss data from the last quarter The model was built on historical data. New data validates or invalidates it Fresh data catches model drift
2. Re-run the segmentation analysis Do the same attributes still predict wins? Market changes. Product evolves. ICP may shift
3. Check churn by ICP tier Are Tier 1 accounts still retaining best? If Tier 1 churn is rising, the model is no longer identifying sticky customers
4. Check pipeline by ICP tier Is > 70% of pipeline from Tier 1-2? If the team is generating mostly non-ICP pipeline, targeting is misaligned
5. Adjust weights or add/remove attributes If an attribute stopped being predictive, remove it. If a new attribute emerged, add it Model accuracy degrades over time without updates
6. Re-validate on holdout data Test the updated model on the last quarter's data Confirms the model still works

Maintenance rules

  • Quarterly is the minimum review cadence. Every quarter, re-run the analysis with fresh data. Annual reviews are too infrequent for a fast-moving SaaS company
  • Track model accuracy over time. "What % of Tier 1 deals closed this quarter?" and "What % of churned customers were Tier 1?" If accuracy is declining, the model needs recalibration
  • The model should get better over time, not stay static. Each quarter adds more data points. More data = better segmentation = more accurate predictions. The Q4 model should outperform the Q1 model

Tools for ICP Modeling

Approach Tool Best for Complexity
Spreadsheet analysis Google Sheets, Excel Teams with 50-200 deals. Manual segmentation and scoring Low
BI tool analysis Looker, Metabase, Mode Teams with 200+ deals. Visual analysis. Shareable dashboards Medium
CRM-native reporting HubSpot reports, Salesforce reports Basic segmentation within CRM. No export needed Low-medium
Statistical modeling (regression) Python (pandas, scikit-learn), R Teams with 500+ deals. Predictive modeling. Feature importance High
Predictive scoring vendors MadKudu, Clearbit Reveal, 6sense Automated ICP scoring with ML. Minimal manual analysis Low (to implement), $$ (cost)

Tool selection rules

  • Start with a spreadsheet. Export CRM data. Do the segmentation manually. You'll learn more from hands-on analysis than from any tool's automated output
  • Graduate to BI tools at 200+ deals. Spreadsheets get unwieldy above 200 rows. BI tools make segmentation visual and shareable
  • Statistical modeling at 500+ deals. Below 500 deals, regression models overfit. Above 500, logistic regression can identify non-obvious attribute interactions that manual analysis misses
  • Predictive scoring vendors at $10M+ ARR. Below $10M, the data volume doesn't justify the tool cost. Above $10M, automated ICP scoring saves RevOps 5-10 hours per week

Measurement

Metric Definition Target Frequency
Model accuracy: win rate by tier Win rate for Tier 1 vs Tier 2 vs non-ICP Tier 1 win rate > 2x non-ICP win rate Quarterly
Model accuracy: churn by tier Churn rate for Tier 1 vs non-ICP Tier 1 churn < 50% of non-ICP churn Quarterly
Pipeline concentration % of pipeline from Tier 1-2 accounts > 70% Monthly
ICP coverage % of accounts in CRM with an ICP score > 90% Monthly
Scoring freshness % of scores updated in last 90 days > 80% Monthly
Tier 1 expansion rate Expansion revenue from Tier 1 accounts > average for all accounts Quarterly

Anti-Pattern Check

  • Building the model from 15 deals. The sample size is too small. Patterns from 15 deals are likely noise, not signal. Wait until you have 50+ wins and 50+ losses before modeling
  • Using only win data (no losses). The model identifies who buys. Without loss data, it can't identify who doesn't buy. The contrast between wins and losses reveals the discriminating attributes. Include both
  • Ignoring churn data. A company type that buys easily but churns at 40% is not ICP. It's a trap. Include churn data in the model. The best ICP attributes predict wins AND retention
  • Over-weighting one attribute. "100% of our wins are in the US" when 95% of prospects are also in the US. Geography isn't a discriminating factor in this case. Compare the win rate for the attribute vs the base rate
  • Building the model once and never updating. The model from 6 months ago was built on different data, a different product, and a different market. Refresh quarterly with new win/loss/churn data
  • Using the model to exclude leads instead of prioritize. The ICP model should determine routing priority and sales effort allocation. It should not be a hard gate that prevents non-ICP leads from ever being contacted. Tier 3 accounts can still be worked at lower priority
  • No anti-ICP in the model. The model only has positive scores. A government agency with 3 positive attributes scores 45/90 and enters the pipeline. Include negative indicators that disqualify regardless of positive fit
  • Scoring without enrichment. The model requires employee count, industry, and funding stage. If 40% of accounts are missing these fields, 40% of scores are wrong. Enrich before scoring
Want agents that use skill files like this?
We customize skill files for your brand voice and methodology, then run content agents against them.
Book a call