A cohort analysis groups users by the month they signed up, then tracks what percentage of each group is still active in every month after. You build it in three steps: define the cohort (signup month), define the retention event (login, payment, key action), and pivot the data so each row is a cohort and each column is months since signup. Below is the same analysis done three ways -- raw SQL, in a product analytics tool, and in a spreadsheet -- plus how to read the retention curve once you have it.

What is a retention cohort?

A retention cohort is a group of users who share a starting event in the same time window -- almost always the month they signed up or first paid. You track that exact group's behavior over subsequent periods, never letting new signups dilute the denominator.

The word "cohort" matters. If you report "60% of users return monthly" without grouping by signup date, you are reporting an average that includes both year-old power users and yesterday's signups. That average hides the only thing you actually need to know: are newer cohorts retaining better than older ones?

Brian Balfour, former HubSpot VP of Growth and CEO of Reforge, frames it bluntly: "retention is the core of your growth model and influences every other input to your model." Cohort analysis is how you measure retention without lying to yourself.

The minimum useful cohort table has three dimensions:

  • Cohort identifier (e.g. signup month: 2026-01)
  • Period offset (months since signup: M0, M1, M2...)
  • Retention metric (% of M0 cohort active in that period)

How do you do a cohort analysis? (the universal 4-step recipe)

Every cohort analysis -- in SQL, in Mixpanel, in a spreadsheet -- follows the same four steps. The tools differ. The logic does not.

  1. Pick the cohort anchor. Most B2B SaaS teams anchor on first-payment date, not signup, to exclude free-trial tourists. Consumer products usually anchor on signup or first key action.
  2. Pick the retention event. "Active" must mean something specific: logged in, opened the app, completed a core action, paid an invoice. Pick the event closest to your north star metric.
  3. Bucket time. Monthly buckets for B2B SaaS. Weekly for consumer apps with daily-use intent. Daily only if you have millions of users -- otherwise the noise drowns the signal.
  4. Pivot. Rows = cohort. Columns = period offset (M0, M1, M2...). Cells = retained users / cohort size. M0 should always equal 100%.

That is the whole job. Steps 1-3 are decisions; step 4 is mechanics. The next three sections show step 4 done three different ways.

How do you do cohort analysis without a data team? (the spreadsheet method)

If you have no analyst and no SQL, you can run a complete cohort analysis in Google Sheets in about 45 minutes. You need two CSVs:

  • users.csv: user_id, signup_date
  • events.csv: user_id, event_date (one row per active session/login/payment)

In Sheets:

  1. Add a column to events.csv called cohort_month -- VLOOKUP the user's signup_date from users.csv, then =TEXT(signup_date, "YYYY-MM").
  2. Add a column called months_since_signup -- =DATEDIF(signup_date, event_date, "M").
  3. Insert a pivot table. Rows: cohort_month. Columns: months_since_signup. Values: COUNTUNIQUE(user_id).
  4. Divide each cell by the M0 value of its row. Format as percentages. Apply conditional formatting (red-yellow-green).

You now have a cohort retention table. The spreadsheet approach breaks above roughly 50,000 event rows -- past that point you need SQL or a product analytics tool. Below that, it is the cheapest, most portable way to do this.

Download the example CSV + sheet template (3,200 anonymized SaaS signups, 12 months of activity).

How do you do cohort analysis in SQL? (Postgres + BigQuery template)

If you have a warehouse with an events table, the entire analysis is one query with two CTEs. Here is the template, identical between Postgres and BigQuery except for the date functions:

-- Postgres / BigQuery cohort retention
WITH cohorts AS (
  SELECT
    user_id,
    DATE_TRUNC('month', MIN(event_date)) AS cohort_month
  FROM events
  WHERE event_name = 'signup'
  GROUP BY user_id
),
activity AS (
  SELECT
    c.cohort_month,
    DATE_DIFF(DATE_TRUNC('month', e.event_date), c.cohort_month, MONTH) AS month_offset,
    COUNT(DISTINCT e.user_id) AS active_users
  FROM events e
  JOIN cohorts c USING (user_id)
  WHERE e.event_name = 'active_session'
  GROUP BY 1, 2
)
SELECT
  cohort_month,
  month_offset,
  active_users,
  ROUND(100.0 * active_users / FIRST_VALUE(active_users)
    OVER (PARTITION BY cohort_month ORDER BY month_offset), 1) AS retention_pct
FROM activity
ORDER BY cohort_month, month_offset;

Notes:

  • In Postgres, replace DATE_DIFF(a, b, MONTH) with EXTRACT(YEAR FROM AGE(a, b)) * 12 + EXTRACT(MONTH FROM AGE(a, b)).
  • In BigQuery, the syntax above runs as-is.
  • Replace event_name = 'active_session' with whatever defines "active" for your product.

For a deeper walkthrough, PopSQL's BigQuery cohort retention tutorial annotates the same pattern line-by-line.

How do you do cohort analysis in Mixpanel, Amplitude, or PostHog?

If your events are already instrumented in a product analytics tool, the analysis takes 15-30 minutes and zero SQL. Each tool calls it something slightly different:

  • Mixpanel: Reports → Retention. Set "first event" to signup, "return event" to your active action. Choose monthly intervals. Mixpanel offers three retention models -- N-day, unbounded, and bracket -- which produce different curves. Use unbounded unless you have a specific reason not to.
  • Amplitude: Analyses → Retention. Same logic. Amplitude lets you split by user property (plan, country, channel) on the same chart -- useful for spotting which segment is driving the average.
  • PostHog: Product Analytics → Retention. Open-source and free up to 1M events/mo. PostHog's cohort builder lets you save a behavioral cohort and reuse it across feature flags and experiments.

All three calculate the same retention matrix shown in the spreadsheet method above. The cost is lock-in: if you change tools, you re-instrument. The benefit is real-time updates and clickable drill-down into individual users in any cell of the cohort table.

What does a healthy retention curve look like?

A healthy retention curve drops in the first 1-3 periods (the natural shakeout of bad-fit users), then flattens to a stable plateau or curves upward. The flattening is the signal you are looking for.

Andrew Chen, General Partner at a16z, listed "cohort retention curves that flatten" as his #1 magic metric for product-market fit. His logic: a flat line means a real subset of users have integrated your product into their workflow and stopped churning. Everything above that flat line is recoverable; everything below it is gone.

Four curve shapes you will actually see:

Shape What it looks like What it means
Cliff drop 100% → 20% by M2, then near-zero Onboarding broken. Users sign up, do not get value, leave. Fix activation before anything else.
Slow decay Steady 5-10% drop every month, no plateau No product-market fit yet. The product works for nobody specific. Narrow the ICP.
Flattening Drops 30-50% in M1-M3, then plateaus Product-market fit with a defined core segment. Pour fuel on that segment's acquisition.
Smiling Drops, plateaus, then curves upward Best case. Existing users expand usage over time -- network effects, habit formation, or strong NRR.

A smiling curve is rare and is what consumer-social and viral products look like at their best. Most B2B SaaS targets a flattening curve. See the four retention curve shapes explained in detail for the underlying math.

What Each Retention Curve Shape Tells You About Product-Market Fit
Smiling curve (M3+ recovery)
85% of cohort retained at M6 (illustrative)
Flattening curve (stable plateau)
65% of cohort retained at M6 (illustrative)
Slow decay (gradual decline)
35% of cohort retained at M6 (illustrative)
Cliff drop (M0-M2 churn)
15% of cohort retained at M6 (illustrative)
Source: Andrew Chen, Magic Metrics for Product/Market Fit (2019, restated 2026)

What does a flattening retention curve mean?

A flattening retention curve means a real cohort of users has stopped churning. The visual signature of product-market fit is the slope going to zero above zero. Drop, drop, drop, then flat.

What the flat segment tells you:

  1. You have a core. Some users genuinely need your product. The slope tells you the size of that segment relative to top-of-funnel.
  2. The plateau height is your real retention. If your cohort flattens at 35% by M6, that 35% is your honest long-term retention. Anything above it in earlier months is acquisition tax, not stickiness.
  3. Your ICP lives inside the plateau. Pull the user IDs that survived to the flat part. Look at their plan, channel, company size, use case. That cluster is your ideal customer profile, in evidence.

In 2026, the median B2B SaaS company flattens around 85-90% annual retention; top-quartile holds above 92%. If your curve flattens below 70% annual, you have a leaky bucket; either tighten your ICP or fix the churn driver before raising acquisition spend.

B2B SaaS Annual Retention Benchmarks (2026)
SMB
85%
Mid-Market
90%
Enterprise
93%
Top-Quartile
92%
Source: SaaS Capital + Benchmarkit 2026 Reports

What is the vanity cohorting trap?

Vanity cohorting is grouping users in a way that makes retention look better than it is. The most common version: putting only "activated" users in the denominator, silently dropping every signup who never returned. Your curve looks like product-market fit. It is selection bias.

Three vanity cohort patterns to watch for:

  • Activated-only cohorts. "Of users who completed onboarding, 80% are still active in M3." True. Also useless -- you have hidden every signup who bounced. The denominator must be all signups, not all activated users.
  • Survivor-biased cohorts. Reporting only on cohorts that are old enough to have a M12 number. New cohorts that look worse get "too early to tell." If they are getting worse, you need to know now.
  • Aggregated retention rate. A single "60% retention" headline number with no cohort breakdown. As the Mixpanel cohort guide puts it, looking at total users together hides important patterns -- older customers churning while new signups mask the decline.

The fix is mechanical: every cohort row must include every user who started in that period, even if they never came back. Their absence in M1+ columns is the entire signal.

Which approach should you use? (decision matrix)

Three approaches, three audiences. Pick the cheapest one that produces a curve you trust.

Use this approach If you... Setup time
Spreadsheet from CSV Have <50K events and no analyst 30-60 min
Mixpanel / Amplitude / PostHog Have events instrumented and a product team 15-30 min
Raw SQL on a warehouse Have a data warehouse and a custom event schema 2-4 hours first time, 5 min after

The spreadsheet approach is right for founders who want one-time directional truth. Product analytics tools are right when retention is a daily operating metric and non-technical PMs need self-serve access. SQL is right when you need to join cohort behavior with billing, support tickets, or anything else outside the analytics tool.

Do not over-engineer this. A correct cohort table in Google Sheets beats an incorrect one in a $40K/year analytics tool. Get the curve. Read it. Decide what to fix. Then upgrade the pipeline.

ApproachBest forSetup timeCostLimitations
Raw SQL (Postgres / BigQuery)Engineering teams with a warehouse and event tables2-4 hours first time$0 (warehouse cost)Requires SQL fluency and clean event schema
Mixpanel / Amplitude / PostHogProduct teams with events already instrumented15-30 minutes$0 free tier to $2K+/moLocked into the tool's cohort logic; export limits
Spreadsheet from CSV exportFounders, marketers, anyone without a data team30-60 minutes$0Manual refresh; breaks past ~50K rows