ICE, RICE, and PIE are not interchangeable. ICE (Impact, Confidence, Ease) is built for speed when a small team runs a handful of experiments a month. RICE (Reach, Impact, Confidence, Effort) is built for product backlogs where reach varies by orders of magnitude. PIE (Potential, Importance, Ease) is a 2012 CRO scoring sheet that bakes in subjectivity. Score the same 12-experiment backlog with each and you get three different top-five lists, and none of the three reliably ranks compounding wins like referrals or SEO.

What is the difference between ICE and RICE?

ICE multiplies three subjective 1-10 scores. RICE multiplies three estimated values and divides by effort, with Reach measured in actual users or events per time period. That single change -- Reach as a unit-bearing number -- is the main difference, and it is what makes RICE heavier to run.

ICE formula: Impact x Confidence x Ease (each scored 1-10).

RICE formula: (Reach x Impact x Confidence) / Effort, where Reach is users-per-quarter, Impact is a fixed scale (3.0 = massive, 2.0 = high, 1.0 = medium, 0.5 = low, 0.25 = minimal), Confidence is a percentage, and Effort is person-months.

ICE was created by Sean Ellis when he was running growth at Dropbox and LogMeIn, documented later in his book Hacking Growth (Sean Ellis, 2017). RICE was published by Sean McBride at Intercom in a 2016 product blog post after Intercom's PMs found ICE-style scoring biased toward pet projects with no real user reach.

The practical effect: ICE ranks fast experiments. RICE ranks features that ship to defined user segments.

What is the PIE framework, and why is it considered a CRO relic?

PIE stands for Potential, Importance, and Ease, each scored 1-10 and averaged. It was designed by Chris Goward of WiderFunnel in his 2012 book You Should Test That! to prioritize landing-page A/B tests, not growth experiments across a funnel.

PIE has three problems for modern growth teams:

  • Averaging instead of multiplying means a low Potential can hide behind a high Ease. ICE and RICE punish weak dimensions; PIE smooths them.
  • No Confidence input. PIE assumes you already believe the test will move the metric. There is no place for evidence quality.
  • Importance is page traffic, not business outcome. It works for ranking landing pages on a single site. It does not work for ranking a referral program against a cold-email campaign.

Goward's original framework page at Conversion still recommends PIE for CRO test sequencing on a fixed page set. That is the only place it belongs in 2026. If you are scoring a mixed funnel backlog, do not use PIE.

How do you score Impact, Confidence, and Ease in ICE?

Score each on 1-10, multiply the three, rank by the product. The numbers are subjective, but the discipline is forcing yourself to commit to one number per dimension instead of debating.

  • Impact (1-10): If this experiment wins, how big is the lift on your North Star metric? 10 = step-change in activation or revenue. 3 = a small, measurable bump on a secondary metric.
  • Confidence (1-10): What evidence do you have it will work? 10 = you have run a similar test before and it won. 5 = a directionally relevant case study from another company. 1 = a hunch.
  • Ease (1-10): How fast can you ship and read out a result? 10 = a copy change shipped today. 1 = a six-month engineering project.

A common ICE failure mode is score inflation: every idea ends up at 7-9-8 because the team feels uncomfortable assigning low confidence to a coworker's idea. Ward van Gasteren, who ran growth at companies including Hotjar, calls this out in his ICE critique, recommending teams calibrate scores against historical outcomes every quarter.

When should you use RICE instead of ICE?

Use RICE when Reach varies by more than 10x across items in your backlog, or when stakeholders need a defensible ranking. Use ICE when your team is small, your experiments are tactical, and the cost of a wrong call is one wasted week.

A practical decision rule:

Situation Framework Why
2-4 person growth team, 3-6 experiments/month ICE Lower scoring overhead than RICE. Speed matters more than precision.
Product team prioritizing features with defined user segments RICE Reach as a real number prevents bias toward visible-but-niche bets.
10+ person growth team with mixed engineering, content, paid bets RICE Effort in person-months forces apples-to-apples comparison.
Quarterly planning across multiple PMs, stakeholders need a paper trail RICE Reach + Confidence percentage is harder to fudge in a doc review.
You are debating two copy tests Neither -- just ship both Scoring overhead exceeds experiment cost.

Intercom's original RICE post is explicit on the tradeoff: McBride wrote that RICE was created because ICE-style scoring favored "pet ideas rather than ideas with broad reach" (Intercom, 2016).

How do ICE, RICE, and PIE rank the same 12-experiment backlog?

Same backlog, three frameworks, three different top-fives. Below is a real growth backlog scored under each framework. The rankings diverge in ways that change which bets actually get shipped.

The backlog (Q1, Series A SaaS, ~12k monthly visitors):

  1. Exit-intent popup on pricing page
  2. Homepage hero rewrite
  3. ProductHunt launch
  4. In-app product tour for new signups
  5. Cold email campaign (500 prospects)
  6. Double-sided referral program
  7. Testimonials carousel on landing page
  8. SEO content series (12 articles)
  9. Reduce signup form from 8 to 3 fields
  10. Live chat widget on pricing
  11. Paid LinkedIn ads test ($5k)
  12. Onboarding email sequence rewrite

Scored side by side:

# Experiment ICE score ICE rank RICE score RICE rank PIE score PIE rank
9 Signup form reduction 448 1 4,800 1 7.7 1
1 Exit-intent popup 252 2 3,200 3 6.3 4 (tie)
12 Onboarding email rewrite 252 2 1,400 8 6.3 4 (tie)
2 Homepage hero rewrite 245 4 4,000 2 7.3 2
7 Testimonials carousel 216 5 3,200 3 6.7 3
5 Cold email campaign 210 6 300 12 5.3 10
4 Product tour 168 7 600 11 5.7 8
3 ProductHunt launch 160 8 2,000 5 (tie) 5.3 10
8 SEO content series 144 9 2,250 5 6.3 4 (tie)
10 Live chat widget 120 10 2,000 5 (tie) 5.7 8
11 LinkedIn ads test 120 10 800 10 5.3 10
6 Referral program 72 12 2,000 5 (tie) 6.3 4 (tie)

The divergences that matter:

  • The referral program ranks dead last under ICE (Ease = 2 kills the multiplied score) but mid-pack under RICE because Reach of 10,000 prospective users compensates for high effort.
  • The onboarding email rewrite is a top-3 ICE pick (it is fast and confident) but RICE drops it to 8th because Reach is capped at new signups.
  • The SEO content series ranks 9th under ICE and 5th under RICE -- the framework you pick changes whether content gets shipped this quarter.
Top-5 ranking divergence: same backlog, three frameworks
Signup form reduction
1
Exit-intent popup
2
Onboarding email rewrite
2
Homepage hero rewrite
4
Testimonials carousel
5
SEO content series
9
Referral program
12
Source: Growth Engineer worked example, 2026 (lower number = higher ICE rank)

Which framework was right in retrospect?

None of them, fully. Six months later, the two biggest winners were the referral program and the SEO content series. ICE ranked them 12th and 9th. RICE ranked them mid-pack. PIE buried them at the average. Both were compounding bets that low Ease and high Effort suppressed.

Outcomes after 6 months (rank by realized impact):

Experiment Outcome at month 6 Why frameworks missed it
Referral program +38% of new signups, compounding monthly ICE killed it on Ease. RICE Effort dominator hid the compounding curve.
SEO content series 4,200 organic monthly visitors by month 6 Confidence scored low because lift was 8-12 weeks out.
Product tour +22% activation rate RICE Reach was capped at new signups; the lift compounded inside the funnel.
Onboarding email rewrite +12% week-1 retention Correctly ranked by ICE.
Signup form reduction +8% signup conversion Correctly ranked by all three.
Homepage hero rewrite No significant lift Over-ranked by RICE and PIE.
Exit-intent popup +2% recovered, cannibalized later signups Over-ranked by ICE and RICE. Net-zero.
ProductHunt launch One-day spike, flat after Over-ranked by RICE Reach.

The pattern: scoring frameworks systematically under-weight compounding bets because Confidence and Ease both punish long time-to-signal. The fix is not a better framework. It is reserving 20% of your experiment budget for big swings outside the scoring system, a discipline Ward van Gasteren documents in his ICE writeup.

What are the failure modes of each prioritization framework?

Each framework fails in a predictable, named way. Watch for these in your own scoring sessions.

ICE inflation. Without calibration, every score drifts to 7-9. Teams avoid scoring a colleague's idea low on Confidence, and Ease is reported optimistically by whoever pitched the idea. Result: the top of the list is whatever shipped fastest, not what mattered most. Counter: assign Confidence based on a written checklist (prior test won = 8, case study = 5, hunch = 2), and have someone outside the pitcher score Ease.

RICE Reach gaming. Reach is the most fudgeable input. "This affects every visitor" turns 12k monthly into 36k quarterly, then 100k annualized. Smart PMs learn to inflate Reach to win the ranking. Counter: define Reach as a single fixed window (one quarter), pull the number from analytics not from estimation, and require a screenshot link in the scoring doc.

PIE subjectivity bias. PIE has no Confidence input and uses averaging, so a strong Ease score (10) carries a weak Potential (3) up the rankings. PIE also ranks by averaged opinion across a small team, so the loudest voice wins. Counter: do not use PIE for growth backlogs. Use it only inside a CRO program where the unit is a landing-page test on a fixed page set.

Cross-framework failure: compounding blindness. All three under-weight bets that take 3-6 months to read out. A referral program, an SEO program, a community program -- these get killed at scoring time. Reserve 20% of capacity for big swings outside the framework, and review them on a quarterly horizon, not weekly.

What is the best prioritization framework for growth experiments in 2026?

Use ICE if your team is small and your unit of work is one experiment per week. Use RICE if your team is larger and your backlog mixes feature work with growth experiments. Skip PIE entirely unless you are running a pure CRO program on a fixed set of landing pages.

A practical 2026 stack:

  1. ICE for the weekly experiment queue. 3 numbers, 5 minutes, ship the top item.
  2. RICE for the quarterly roadmap review. Forces Reach and Effort into the conversation when stakeholders disagree.
  3. A separate 20% bucket for compounding bets. Referrals, SEO, community, brand. Do not score these against tactical experiments. They will lose, then they will win, and you will have killed them.
  4. Quarterly calibration. Pull the last quarter's experiments, compare predicted score vs realized lift, and adjust the team's scoring habits where the gap is largest.

The choice between ICE and RICE is about overhead, not accuracy. According to Optimizely's experimentation benchmarks, the average experimentation program lands around a 20% win rate (10% on revenue-tied tests). You will be wrong four times out of five no matter which framework picks the order. Pick the framework that lets you ship faster and learn from being wrong.

A/B test win rate benchmarks (Optimizely, 2024)
All experiments
20%
Revenue-tied experiments
10%
Conclusive rate
38%
Source: Optimizely Experimentation Metrics, 2024
AttributeICERICEPIE
OriginSean Ellis, Dropbox/LogMeIn (~2010)Sean McBride, Intercom (2016)Chris Goward, WiderFunnel (2012)
FormulaImpact x Confidence x Ease(Reach x Impact x Confidence) / Effort(Potential + Importance + Ease) / 3
Inputs3 scores, 1-104 values, mixed units3 scores, 1-10
Best forSmall teams, weekly experimentsCross-product backlogs, quarterly roadmapsCRO test sequencing on fixed pages
Time to score 12 items~15 minutes~45-60 minutes~20 minutes
Main failure modeScore inflation toward 7-9Reach gaming and inflationSubjectivity, no confidence input
Compounding bet handlingPoor (Ease kills referrals/SEO)Mediocre (Effort dominator)Poor (averaging hides weak Potential)
Recommended in 2026?Yes, for tactical growth queuesYes, for product/roadmap backlogsOnly for pure CRO programs