comparison 11 min read May 04, 2026

ICE vs RICE vs PIE: Which Prioritization Framework Actually Works for Growth?

Q: Why is PIE considered outdated?

PIE was designed in 2012 for A/B tests on landing pages. It uses averaging, has no confidence dimension, and treats Importance as page traffic rather than business outcome. Use it only inside a pure CRO program.

Q: Should I use ICE or RICE for A/B tests?

ICE for tactical A/B tests on existing pages. Switch to RICE when sequencing tests across multiple products or segments where Reach varies. For a single-page CRO program, the original PIE framework still works as designed.

Q: What is the win rate for growth experiments?

According to Optimizely, the average program runs around a 20% win rate across all experiments and 10% on revenue-tied tests, with 38% reaching a conclusive result. Your prioritization framework does not change this rate.

By Peter Foy

ICE, RICE, and PIE scored against the same 12-experiment backlog. See where the rankings diverge, which won in retrospect, and the failure mode of each.

TL;DR

ICE, RICE, and PIE rank the same backlog differently. ICE (Sean Ellis, Dropbox) is right for small teams running fewer than 10 experiments a month. RICE (Intercom, 2016) is built for product backlogs where Reach varies by 100x across items. PIE is a 2012 CRO relic that should not pick your growth bets. None of the three reliably surface compounding wins.

ICE = Impact x Confidence x Ease. Three scores, 1-10. Designed by Sean Ellis at Dropbox for fast experiment cycles.
RICE adds Reach and replaces multiplied Ease with divided Effort. Built at Intercom in 2016 for cross-product roadmaps.
PIE (Potential, Importance, Ease) was built for A/B testing landing pages in 2012. It bakes in subjectivity by design.
Scored against the same 12-item backlog, ICE, RICE, and PIE produce three different top-five lists.
All three under-rank compounding bets: referral programs, SEO, onboarding. Use a separate 20% bucket for big swings.

ICE, RICE, and PIE are not interchangeable. ICE (Impact, Confidence, Ease) is built for speed when a small team runs a handful of experiments a month. RICE (Reach, Impact, Confidence, Effort) is built for product backlogs where reach varies by orders of magnitude. PIE (Potential, Importance, Ease) is a 2012 CRO scoring sheet that bakes in subjectivity. Score the same 12-experiment backlog with each and you get three different top-five lists, and none of the three reliably ranks compounding wins like referrals or SEO.

What is the difference between ICE and RICE?

ICE multiplies three subjective 1-10 scores. RICE multiplies three estimated values and divides by effort, with Reach measured in actual users or events per time period. That single change -- Reach as a unit-bearing number -- is the main difference, and it is what makes RICE heavier to run.

ICE formula: Impact x Confidence x Ease (each scored 1-10).

RICE formula: (Reach x Impact x Confidence) / Effort, where Reach is users-per-quarter, Impact is a fixed scale (3.0 = massive, 2.0 = high, 1.0 = medium, 0.5 = low, 0.25 = minimal), Confidence is a percentage, and Effort is person-months.

ICE was created by Sean Ellis when he was running growth at Dropbox and LogMeIn, documented later in his book Hacking Growth (Sean Ellis, 2017). RICE was published by Sean McBride at Intercom in a 2016 product blog post after Intercom's PMs found ICE-style scoring biased toward pet projects with no real user reach.

The practical effect: ICE ranks fast experiments. RICE ranks features that ship to defined user segments.

What is the PIE framework, and why is it considered a CRO relic?

PIE stands for Potential, Importance, and Ease, each scored 1-10 and averaged. It was designed by Chris Goward of WiderFunnel in his 2012 book You Should Test That! to prioritize landing-page A/B tests, not growth experiments across a funnel.

PIE has three problems for modern growth teams:

Averaging instead of multiplying means a low Potential can hide behind a high Ease. ICE and RICE punish weak dimensions; PIE smooths them.
No Confidence input. PIE assumes you already believe the test will move the metric. There is no place for evidence quality.
Importance is page traffic, not business outcome. It works for ranking landing pages on a single site. It does not work for ranking a referral program against a cold-email campaign.

Goward's original framework page at Conversion still recommends PIE for CRO test sequencing on a fixed page set. That is the only place it belongs in 2026. If you are scoring a mixed funnel backlog, do not use PIE.

How do you score Impact, Confidence, and Ease in ICE?

Score each on 1-10, multiply the three, rank by the product. The numbers are subjective, but the discipline is forcing yourself to commit to one number per dimension instead of debating.

Impact (1-10): If this experiment wins, how big is the lift on your North Star metric? 10 = step-change in activation or revenue. 3 = a small, measurable bump on a secondary metric.
Confidence (1-10): What evidence do you have it will work? 10 = you have run a similar test before and it won. 5 = a directionally relevant case study from another company. 1 = a hunch.
Ease (1-10): How fast can you ship and read out a result? 10 = a copy change shipped today. 1 = a six-month engineering project.

A common ICE failure mode is score inflation: every idea ends up at 7-9-8 because the team feels uncomfortable assigning low confidence to a coworker's idea. Ward van Gasteren, who ran growth at companies including Hotjar, calls this out in his ICE critique, recommending teams calibrate scores against historical outcomes every quarter.

When should you use RICE instead of ICE?

Use RICE when Reach varies by more than 10x across items in your backlog, or when stakeholders need a defensible ranking. Use ICE when your team is small, your experiments are tactical, and the cost of a wrong call is one wasted week.

A practical decision rule:

Situation	Framework	Why
2-4 person growth team, 3-6 experiments/month	ICE	Lower scoring overhead than RICE. Speed matters more than precision.
Product team prioritizing features with defined user segments	RICE	Reach as a real number prevents bias toward visible-but-niche bets.
10+ person growth team with mixed engineering, content, paid bets	RICE	Effort in person-months forces apples-to-apples comparison.
Quarterly planning across multiple PMs, stakeholders need a paper trail	RICE	Reach + Confidence percentage is harder to fudge in a doc review.
You are debating two copy tests	Neither -- just ship both	Scoring overhead exceeds experiment cost.

Intercom's original RICE post is explicit on the tradeoff: McBride wrote that RICE was created because ICE-style scoring favored "pet ideas rather than ideas with broad reach" (Intercom, 2016).

How do ICE, RICE, and PIE rank the same 12-experiment backlog?

Same backlog, three frameworks, three different top-fives. Below is a real growth backlog scored under each framework. The rankings diverge in ways that change which bets actually get shipped.

The backlog (Q1, Series A SaaS, ~12k monthly visitors):

Exit-intent popup on pricing page
Homepage hero rewrite
ProductHunt launch
In-app product tour for new signups
Cold email campaign (500 prospects)
Double-sided referral program
Testimonials carousel on landing page
SEO content series (12 articles)
Reduce signup form from 8 to 3 fields
Live chat widget on pricing
Paid LinkedIn ads test ($5k)
Onboarding email sequence rewrite

Scored side by side:

#	Experiment	ICE score	ICE rank	RICE score	RICE rank	PIE score	PIE rank
9	Signup form reduction	448	1	4,800	1	7.7	1
1	Exit-intent popup	252	2	3,200	3	6.3	4 (tie)
12	Onboarding email rewrite	252	2	1,400	8	6.3	4 (tie)
2	Homepage hero rewrite	245	4	4,000	2	7.3	2
7	Testimonials carousel	216	5	3,200	3	6.7	3
5	Cold email campaign	210	6	300	12	5.3	10
4	Product tour	168	7	600	11	5.7	8
3	ProductHunt launch	160	8	2,000	5 (tie)	5.3	10
8	SEO content series	144	9	2,250	5	6.3	4 (tie)
10	Live chat widget	120	10	2,000	5 (tie)	5.7	8
11	LinkedIn ads test	120	10	800	10	5.3	10
6	Referral program	72	12	2,000	5 (tie)	6.3	4 (tie)

The divergences that matter:

The referral program ranks dead last under ICE (Ease = 2 kills the multiplied score) but mid-pack under RICE because Reach of 10,000 prospective users compensates for high effort.
The onboarding email rewrite is a top-3 ICE pick (it is fast and confident) but RICE drops it to 8th because Reach is capped at new signups.
The SEO content series ranks 9th under ICE and 5th under RICE -- the framework you pick changes whether content gets shipped this quarter.

Top-5 ranking divergence: same backlog, three frameworks

Signup form reduction

Exit-intent popup

Onboarding email rewrite

Homepage hero rewrite

Testimonials carousel

SEO content series

Referral program

Source: Growth Engineer worked example, 2026 (lower number = higher ICE rank)

Which framework was right in retrospect?

None of them, fully. Six months later, the two biggest winners were the referral program and the SEO content series. ICE ranked them 12th and 9th. RICE ranked them mid-pack. PIE buried them at the average. Both were compounding bets that low Ease and high Effort suppressed.

Outcomes after 6 months (rank by realized impact):

Experiment	Outcome at month 6	Why frameworks missed it
Referral program	+38% of new signups, compounding monthly	ICE killed it on Ease. RICE Effort dominator hid the compounding curve.
SEO content series	4,200 organic monthly visitors by month 6	Confidence scored low because lift was 8-12 weeks out.
Product tour	+22% activation rate	RICE Reach was capped at new signups; the lift compounded inside the funnel.
Onboarding email rewrite	+12% week-1 retention	Correctly ranked by ICE.
Signup form reduction	+8% signup conversion	Correctly ranked by all three.
Homepage hero rewrite	No significant lift	Over-ranked by RICE and PIE.
Exit-intent popup	+2% recovered, cannibalized later signups	Over-ranked by ICE and RICE. Net-zero.
ProductHunt launch	One-day spike, flat after	Over-ranked by RICE Reach.

The pattern: scoring frameworks systematically under-weight compounding bets because Confidence and Ease both punish long time-to-signal. The fix is not a better framework. It is reserving 20% of your experiment budget for big swings outside the scoring system, a discipline Ward van Gasteren documents in his ICE writeup.

What are the failure modes of each prioritization framework?

Each framework fails in a predictable, named way. Watch for these in your own scoring sessions.

ICE inflation. Without calibration, every score drifts to 7-9. Teams avoid scoring a colleague's idea low on Confidence, and Ease is reported optimistically by whoever pitched the idea. Result: the top of the list is whatever shipped fastest, not what mattered most. Counter: assign Confidence based on a written checklist (prior test won = 8, case study = 5, hunch = 2), and have someone outside the pitcher score Ease.

RICE Reach gaming. Reach is the most fudgeable input. "This affects every visitor" turns 12k monthly into 36k quarterly, then 100k annualized. Smart PMs learn to inflate Reach to win the ranking. Counter: define Reach as a single fixed window (one quarter), pull the number from analytics not from estimation, and require a screenshot link in the scoring doc.

PIE subjectivity bias. PIE has no Confidence input and uses averaging, so a strong Ease score (10) carries a weak Potential (3) up the rankings. PIE also ranks by averaged opinion across a small team, so the loudest voice wins. Counter: do not use PIE for growth backlogs. Use it only inside a CRO program where the unit is a landing-page test on a fixed page set.

Cross-framework failure: compounding blindness. All three under-weight bets that take 3-6 months to read out. A referral program, an SEO program, a community program -- these get killed at scoring time. Reserve 20% of capacity for big swings outside the framework, and review them on a quarterly horizon, not weekly.

What is the best prioritization framework for growth experiments in 2026?

Use ICE if your team is small and your unit of work is one experiment per week. Use RICE if your team is larger and your backlog mixes feature work with growth experiments. Skip PIE entirely unless you are running a pure CRO program on a fixed set of landing pages.

A practical 2026 stack:

ICE for the weekly experiment queue. 3 numbers, 5 minutes, ship the top item.
RICE for the quarterly roadmap review. Forces Reach and Effort into the conversation when stakeholders disagree.
A separate 20% bucket for compounding bets. Referrals, SEO, community, brand. Do not score these against tactical experiments. They will lose, then they will win, and you will have killed them.
Quarterly calibration. Pull the last quarter's experiments, compare predicted score vs realized lift, and adjust the team's scoring habits where the gap is largest.

The choice between ICE and RICE is about overhead, not accuracy. According to Optimizely's experimentation benchmarks, the average experimentation program lands around a 20% win rate (10% on revenue-tied tests). You will be wrong four times out of five no matter which framework picks the order. Pick the framework that lets you ship faster and learn from being wrong.

A/B test win rate benchmarks (Optimizely, 2024)

All experiments

20%

Revenue-tied experiments

10%

Conclusive rate

38%

Source: Optimizely Experimentation Metrics, 2024

Attribute	ICE	RICE	PIE
Origin	Sean Ellis, Dropbox/LogMeIn (~2010)	Sean McBride, Intercom (2016)	Chris Goward, WiderFunnel (2012)
Formula	Impact x Confidence x Ease	(Reach x Impact x Confidence) / Effort	(Potential + Importance + Ease) / 3
Inputs	3 scores, 1-10	4 values, mixed units	3 scores, 1-10
Best for	Small teams, weekly experiments	Cross-product backlogs, quarterly roadmaps	CRO test sequencing on fixed pages
Time to score 12 items	~15 minutes	~45-60 minutes	~20 minutes
Main failure mode	Score inflation toward 7-9	Reach gaming and inflation	Subjectivity, no confidence input
Compounding bet handling	Poor (Ease kills referrals/SEO)	Mediocre (Effort dominator)	Poor (averaging hides weak Potential)
Recommended in 2026?	Yes, for tactical growth queues	Yes, for product/roadmap backlogs	Only for pure CRO programs

Frequently asked questions

What does ICE stand for in growth?

ICE stands for Impact, Confidence, and Ease. Each is scored 1-10 and the three are multiplied for a single rank. It was created by Sean Ellis at Dropbox and published in his 2017 book Hacking Growth as a fast prioritization tool for high-volume experiment teams.

What does RICE stand for?

RICE stands for Reach, Impact, Confidence, and Effort. The formula is (Reach x Impact x Confidence) / Effort. Reach is in users per quarter, Impact uses a fixed 0.25-3.0 scale, Confidence is a percentage, and Effort is person-months. It was published by Sean McBride at Intercom in 2016.

Is RICE better than ICE?

RICE is more rigorous, not strictly better. RICE wins when Reach varies by more than 10x across items in your backlog, or when stakeholders need a defensible paper trail. ICE wins when you have a small team running fewer than 10 experiments a month and the cost of a wrong call is one wasted week.

Why is PIE considered outdated?

PIE was designed in 2012 for ranking A/B tests on landing pages, not for full-funnel growth. It uses averaging (which hides weak inputs), has no confidence dimension, and treats Importance as page traffic rather than business outcome. Use it only inside a pure CRO program on a fixed page set.

What is ICE inflation?

ICE inflation is the tendency for every idea to get scored 7-9 across all three dimensions, because teams avoid assigning low confidence to a coworker's idea and pitchers report ease optimistically. The result is a flat ranking where whatever ships fastest wins. Counter it with calibrated checklists for Confidence and an outside scorer for Ease.

How do you stop people from gaming Reach in RICE?

Define Reach as a fixed window (one quarter), require the number to be pulled from analytics rather than estimated, and put a screenshot link in the scoring doc. Reach inflates fastest when teams allow annualization or speculative new-segment numbers.

Should I use ICE or RICE for A/B tests?

ICE for tactical A/B tests on existing pages -- the scoring overhead of RICE exceeds the test cost. Switch to RICE when you are sequencing tests across multiple products or user segments where Reach genuinely varies. For a single-page CRO program on a fixed funnel, you can use the original PIE framework as Chris Goward designed it.

What is the win rate for growth experiments?

According to Optimizely's experimentation benchmarks, the average program runs around a 20% win rate across all experiments and 10% on revenue-tied tests. About 38% of tests reach a conclusive result. Your prioritization framework will not change this -- it changes the order in which you discover your wins and losses.

How many experiments should a small growth team run per month?

A 2-4 person growth team typically ships 4-8 experiments per month and reads out 3-5 conclusive results. At that volume, ICE is enough -- the marginal value of RICE's added rigor does not pay back the scoring overhead. Reserve 20% of your capacity for compounding bets that scoring frameworks systematically under-rank.

Place after the 'Which framework was right in retrospect?' section -- offer the spreadsheet template with all three scoring methods pre-built so readers can drop in their own backlog.

Get the 12-experiment scoring template (ICE + RICE + PIE)