Pick your north star metric by scoring 3 candidates against 5 criteria: leading (predicts revenue 60-90 days out), measurable (one SQL query, one number), actionable (your team can move its inputs), value-aligned (rises only when customers get value), and single-number (not a ratio). Score each candidate 1-5 on each criterion. The highest total wins, but only if it survives a stress test against your sales comp plan and product roadmap. This guide walks through the test with two parallel worked examples and shows how to instrument the winner.
What is a north star metric, and why does picking the wrong one cost you a year?
A north star metric (NSM) is the single number that captures the core value your product delivers and predicts long-term revenue. Sean Ellis, who coined the term, defines it as "the single metric that best captures the core value that your product delivers to customers."
Picking the wrong one is expensive because the NSM dictates roadmap, hiring, and comp. A team that picks MAU as its NSM will build features that drive logins. A team that picks weekly active queries will build features that drive depth of use. Different metric, different product, different company 12 months later.
The failure pattern is well-documented. Reforge's Brian Balfour warns that "blindly buying into the concept of the one metric that matters is a fatal oversimplification." And John Cutler, author of Amplitude's North Star Playbook, notes: "If you can move your North Star directly, it's probably not a good North Star."
The job is not to pick a metric that sounds inspiring on a slide. The job is to pick a metric whose movement reliably predicts whether customers are getting value and whether your business will grow. That's what the 5-criterion test is for.
What are the 5 criteria for picking a north star metric?
A good NSM passes five tests. Score each candidate 1-5 on each. Anything below 4 on any criterion is a red flag.
- Leading. The metric moves before revenue moves. If it tracks lagging revenue, it cannot guide decisions; you'll only see the result after the quarter is lost. Target: predicts revenue 60-90 days out.
- Measurable. One SQL query, one number, one dashboard. If it requires three teams to reconcile definitions, it's not measurable, it's a debate.
- Actionable. Your productteam can move its inputs. Cutler's framework explicitly says you should not be able to move the NSM directly -- but you must be able to move the breadth, depth, frequency, or efficiency drivers underneath it.
- Value-aligned. It only goes up when customers get more value. A metric that rises while NPS falls fails this test. This is what blocks vanity metrics: total signups can grow while engaged users shrink.
- Single-number, non-ratio. Sean Ellis is explicit: "It should not be a ratio." Ratios let teams optimize the denominator (kick out low-engagement users) instead of the numerator (deliver more value).
A candidate that scores 23-25 of 25 is a real NSM. A candidate that scores 18-22 is workable but you should keep looking. Below 18, kill it.
How do you score candidate metrics against the 5 criteria? (Worked example 1: PLG analytics tool)
Imagine a self-serve product analytics tool, similar to early Mixpanel or PostHog. The product is free up to 1M events/month, paid above that. The team brainstorms three candidate NSMs:
- Candidate A: Monthly Active Users (MAU)
- Candidate B: Weekly Active Querying Users -- accounts where >=3 unique users ran >=5 queries in the last 7 days
- Candidate C: MRR
Score each against the 5 criteria:
| Criterion | A: MAU | B: Weekly Active Querying Users | C: MRR |
|---|---|---|---|
| Leading (predicts revenue 60-90d) | 2 | 5 | 1 |
| Measurable (one query, one number) | 5 | 4 | 5 |
| Actionable (team can move inputs) | 3 | 5 | 2 |
| Value-aligned (rises only with value) | 2 | 5 | 3 |
| Single-number, non-ratio | 5 | 5 | 5 |
| Total | 17 | 24 | 16 |
Winner: Weekly Active Querying Users (24/25).
Why MAU loses: a logged-in user who never queries is not getting value. The metric inflates without product-market fit deepening. Why MRR loses: it's a lagging indicator of decisions made 60-90 days ago, and the product team cannot move it directly without sales involvement. Why the winner wins: querying is the product's "aha moment", it correlates tightly with conversion to paid, and the team can move its inputs (onboarding query templates, integration breadth, alert features).
How does the test work for a sales-led product? (Worked example 2: HR platform)
Now apply the same test to a sales-led HR platform, similar to Lattice or 15Five. Average contract is $40K ARR, 12-month deals, sales-assisted onboarding. Three candidate NSMs:
- Candidate A: Seats Sold
- Candidate B: Performance Reviews Completed per Active Manager per Quarter
- Candidate C: Logo Retention %
Score them:
| Criterion | A: Seats Sold | B: Reviews Completed / Manager / Quarter | C: Logo Retention % |
|---|---|---|---|
| Leading (predicts revenue 60-90d) | 2 | 5 | 3 |
| Measurable (one query, one number) | 5 | 4 | 4 |
| Actionable (team can move inputs) | 3 | 5 | 3 |
| Value-aligned (rises only with value) | 2 | 5 | 4 |
| Single-number, non-ratio | 5 | 5 | 1 |
| Total | 17 | 24 | 15 |
Winner: Reviews Completed per Active Manager per Quarter (24/25).
Logo Retention loses on the ratio rule -- it's a percentage, so the team can optimize by churning small accounts faster (shrinking the denominator). Seats Sold loses because it's a leading indicator of contracts, not value: enterprises buy 500 seats and use 80, then churn at renewal. The winning metric -- a count of completed reviews per active manager -- only rises when the product is actually used for its core job. It maps directly to renewal probability, which Reforge's research shows is the strongest predictor of net revenue retention in B2B SaaS.
Should a startup pick MRR or an engagement metric as its north star?
Pick the engagement metric, almost always. MRR fails the leading-indicator test (it lags real customer behavior by 30-90 days) and the actionable test (most product changes can't move it directly).
The exception is a pure transactional SaaS where MRR moves the same week the product is used -- think a usage-billed API where every successful call generates revenue. In that narrow case, MRR and engagement are nearly the same metric.
For everyone else, Sean Ellis is explicit: "the north star metric should capture units of value being delivered to users -- not revenue." Revenue is the result of value delivered. Tracking the result instead of the cause means you only see problems after they hit the bank account.
The practical rule: if you're pre-Series B, pick a depth-of-engagement metric. After Series B, when you have enough data to map engagement-to-revenue conversion rates with confidence, you can layer revenue targets on top of the engagement NSM. But the NSM itself stays anchored to customer value.
How do you instrument your north star metric in a warehouse and product analytics tool?
Once you've picked the winner, instrument it in three places so every team sees the same number.
Step 1: Define the metric in your warehouse (single source of truth). Write a dbt model that produces one row per (account, week) with the NSM value. This is the canonical definition. Example for Worked Example 1:
-- models/marts/north_star_weekly_active_querying.sql
select
account_id,
date_trunc('week', event_ts) as week,
count(distinct user_id) as querying_users,
count(*) as total_queries
from events
where event_name = 'query_run'
group by 1, 2
having count(distinct user_id) >= 3 and count(*) >= 5
Step 2: Pipe events into a product analytics tool. Send the same events to Mixpanel, Amplitude, or PostHog so PMs can slice the NSM by feature, cohort, or segment without writing SQL. The Mixpanel x Census integration keeps both layers in sync.
Step 3: Build the input tree. John Cutler's framework puts 3-5 inputs underneath the NSM, usually mapped to breadth, depth, frequency, efficiency. For Weekly Active Querying Users, inputs are: new querying accounts/week (breadth), queries per active user (depth), days/week with at least one query (frequency), and time-to-first-query for new signups (efficiency). Teams own inputs; leadership owns the NSM.
How do you know if your north star metric is wrong?
Your NSM is wrong if any of these four signals show up. Even if it scored 25/25 on the criteria, an NSM that breaks the org is broken.
- It rises while NPS, retention, or expansion fall. This is the Jay Stansell pattern: "the North Star was shining while the product was quietly dying underneath it." Total signups grew, engaged users shrank, the company shipped fast and lost slowly.
- It conflicts with sales comp. This is the failure case the brief warned about. A PLG company picks Weekly Active Querying Users as NSM. The metric scores 24/25. But sales comp is 100% on logo ACV, so AEs chase enterprise logos that buy 200 seats and query 4 times a quarter. Product optimizes for query velocity. Sales optimizes for big checks. Roadmap and quotas diverge. Within 6 months, exec meetings turn into hostage negotiations. Fix: align comp to the NSM, or accept that you have a leading and a lagging metric and weight them.
- You can move it directly. Cutler's red flag. If a campaign or pricing change moves the NSM by 20% in a week, you picked an output, not a true north star. True NSMs move via inputs, not levers.
- No team owns the inputs. If product, marketing, and CS each blame the others when the NSM dips, the input tree is incomplete. Map every input to one team.
Can you change your north star metric?
Yes, and you probably will. Most companies change NSMs every 18-36 months as the business model matures.
Amplitude itself changed its NSM as it grew, and the company that wrote the playbook explicitly recommends revisiting the metric annually. Stage transitions force the change: a pre-PMF startup tracks engagement depth, a Series B company adds activation breadth, a public company tracks expansion.
The rule for changing: do it deliberately, communicate it widely, and change the inputs and dashboards in the same week. Half-migrations, where some teams use the new NSM and others use the old one, cause more damage than the wrong metric. Pick a date, ship the new dbt model, retire the old dashboard, and re-run the 5-criterion test.
What NOT to do: change the NSM every quarter to chase whatever is up-and-to-the-right that week. That's not a north star, that's a weather vane.
| Criterion | What it tests | Pass threshold | Common failure |
|---|---|---|---|
| Leading | Does it predict revenue 60-90 days out? | Score 4-5 | Tracking MRR (lagging) instead of usage (leading) |
| Measurable | One SQL query, one number? | Score 4-5 | Definition requires three teams to reconcile |
| Actionable | Can your team move its inputs? | Score 4-5 | Picking a metric only Sales or Finance can move |
| Value-aligned | Does it only rise when customers get value? | Score 4-5 | MAU rising while engaged users shrink |
| Single-number | Is it a count, not a ratio? | Score 5 | Logo retention % -- optimize by churning small accounts |