how-to 11 min read May 04, 2026

How to Run a Cohort Analysis From Scratch (Even Without a Data Team)

Q: How do you do a cohort analysis without a data team?

Export a CSV with user_id and signup_date plus a CSV of events with user_id and event_date. In Google Sheets, build a pivot table with signup_month as rows, months_since_signup as columns, and COUNTUNIQUE(user_id) as values. Divide each cell by the cohort's M0 size. Total time: about 45 minutes.

Q: What is a retention cohort?

A retention cohort is a group of users who share a starting event in the same time window, usually the month they signed up or first paid. You track what percentage of that exact group is still active in each subsequent period.

Q: What does a healthy retention curve look like?

A healthy curve drops in the first 1-3 periods, then flattens to a stable plateau or curves upward. Andrew Chen calls a flattening curve one of the strongest signals of product-market fit.

Q: What does a flattening retention curve mean?

A flattening curve means you have found a segment of users who genuinely need your product and keep coming back. The slope going flat (above zero) is the visual signature of product-market fit.

Q: How is cohort analysis different from churn rate?

Churn rate is a single aggregate number across all customers. Cohort analysis splits users by signup date and tracks each group separately over time, surfacing trends that churn rate hides.

Q: How many months of data do you need to run a useful cohort analysis?

At least 3-6 months of signup data to see the shape of a curve. Six to twelve cohorts to spot trend lines across cohorts. For SaaS contracts, at least one full renewal cycle (12-13 months) before drawing strong conclusions.

Q: What is vanity cohorting?

Vanity cohorting is grouping users to make retention look better than it is, usually by including only active or paying users in the denominator instead of all signups. It produces a curve that looks like product-market fit but is actually selection bias.

Q: Should I use weekly or monthly cohorts?

Use weekly cohorts for consumer products with daily-use intent. Use monthly cohorts for B2B SaaS and subscription products. Anchor B2B cohorts on first-payment date, not signup, because free trial signups inflate cohort size.

Q: Can I run cohort analysis on Stripe data alone?

Yes, for revenue cohorts. Export the customers and invoices CSVs from Stripe, anchor each customer to their first successful invoice, and track MRR by month. For product engagement cohorts, you need event data from your app.

By Peter Foy

Run a cohort analysis three ways: SQL, Mixpanel/Amplitude/PostHog, or spreadsheet. Free CSV + SQL template. Read the retention curve correctly.

TL;DR

To do a cohort analysis, group users by the month they signed up, track what percentage of each group returns in each subsequent month, and plot the result as a retention curve. You can build it three ways: raw SQL on a warehouse, in Mixpanel/Amplitude/PostHog, or in a spreadsheet from CSV exports. The shape of the curve tells you whether you have product-market fit.

Cohorts group users by signup period (usually month) and track each group's retention separately over time.
A flattening retention curve is Andrew Chen's #1 signal of product-market fit; a curve trending to zero means you do not have it yet.
Three viable approaches: SQL (2-4 hours), product analytics tool (15-30 min), spreadsheet from CSV (30-60 min).
Vanity cohorting -- excluding inactive signups from the denominator -- makes bad retention look healthy. Always count every signup.
B2B SaaS median annual retention sits at 85-90% in 2026; top quartile holds above 92%.

A cohort analysis groups users by the month they signed up, then tracks what percentage of each group is still active in every month after. You build it in three steps: define the cohort (signup month), define the retention event (login, payment, key action), and pivot the data so each row is a cohort and each column is months since signup. Below is the same analysis done three ways -- raw SQL, in a product analytics tool, and in a spreadsheet -- plus how to read the retention curve once you have it.

What is a retention cohort?

A retention cohort is a group of users who share a starting event in the same time window -- almost always the month they signed up or first paid. You track that exact group's behavior over subsequent periods, never letting new signups dilute the denominator.

The word "cohort" matters. If you report "60% of users return monthly" without grouping by signup date, you are reporting an average that includes both year-old power users and yesterday's signups. That average hides the only thing you actually need to know: are newer cohorts retaining better than older ones?

Brian Balfour, former HubSpot VP of Growth and CEO of Reforge, frames it bluntly: "retention is the core of your growth model and influences every other input to your model." Cohort analysis is how you measure retention without lying to yourself.

The minimum useful cohort table has three dimensions:

Cohort identifier (e.g. signup month: 2026-01)
Period offset (months since signup: M0, M1, M2...)
Retention metric (% of M0 cohort active in that period)

How do you do a cohort analysis? (the universal 4-step recipe)

Every cohort analysis -- in SQL, in Mixpanel, in a spreadsheet -- follows the same four steps. The tools differ. The logic does not.

Pick the cohort anchor. Most B2B SaaS teams anchor on first-payment date, not signup, to exclude free-trial tourists. Consumer products usually anchor on signup or first key action.
Pick the retention event. "Active" must mean something specific: logged in, opened the app, completed a core action, paid an invoice. Pick the event closest to your north star metric.
Bucket time. Monthly buckets for B2B SaaS. Weekly for consumer apps with daily-use intent. Daily only if you have millions of users -- otherwise the noise drowns the signal.
Pivot. Rows = cohort. Columns = period offset (M0, M1, M2...). Cells = retained users / cohort size. M0 should always equal 100%.

That is the whole job. Steps 1-3 are decisions; step 4 is mechanics. The next three sections show step 4 done three different ways.

How do you do cohort analysis without a data team? (the spreadsheet method)

If you have no analyst and no SQL, you can run a complete cohort analysis in Google Sheets in about 45 minutes. You need two CSVs:

users.csv: user_id, signup_date
events.csv: user_id, event_date (one row per active session/login/payment)

In Sheets:

Add a column to events.csv called cohort_month -- VLOOKUP the user's signup_date from users.csv, then =TEXT(signup_date, "YYYY-MM").
Add a column called months_since_signup -- =DATEDIF(signup_date, event_date, "M").
Insert a pivot table. Rows: cohort_month. Columns: months_since_signup. Values: COUNTUNIQUE(user_id).
Divide each cell by the M0 value of its row. Format as percentages. Apply conditional formatting (red-yellow-green).

You now have a cohort retention table. The spreadsheet approach breaks above roughly 50,000 event rows -- past that point you need SQL or a product analytics tool. Below that, it is the cheapest, most portable way to do this.

Download the example CSV + sheet template (3,200 anonymized SaaS signups, 12 months of activity).

How do you do cohort analysis in SQL? (Postgres + BigQuery template)

If you have a warehouse with an events table, the entire analysis is one query with two CTEs. Here is the template, identical between Postgres and BigQuery except for the date functions:

-- Postgres / BigQuery cohort retention
WITH cohorts AS (
  SELECT
    user_id,
    DATE_TRUNC('month', MIN(event_date)) AS cohort_month
  FROM events
  WHERE event_name = 'signup'
  GROUP BY user_id
),
activity AS (
  SELECT
    c.cohort_month,
    DATE_DIFF(DATE_TRUNC('month', e.event_date), c.cohort_month, MONTH) AS month_offset,
    COUNT(DISTINCT e.user_id) AS active_users
  FROM events e
  JOIN cohorts c USING (user_id)
  WHERE e.event_name = 'active_session'
  GROUP BY 1, 2
)
SELECT
  cohort_month,
  month_offset,
  active_users,
  ROUND(100.0 * active_users / FIRST_VALUE(active_users)
    OVER (PARTITION BY cohort_month ORDER BY month_offset), 1) AS retention_pct
FROM activity
ORDER BY cohort_month, month_offset;

Notes:

In Postgres, replace DATE_DIFF(a, b, MONTH) with EXTRACT(YEAR FROM AGE(a, b)) * 12 + EXTRACT(MONTH FROM AGE(a, b)).
In BigQuery, the syntax above runs as-is.
Replace event_name = 'active_session' with whatever defines "active" for your product.

For a deeper walkthrough, PopSQL's BigQuery cohort retention tutorial annotates the same pattern line-by-line.

How do you do cohort analysis in Mixpanel, Amplitude, or PostHog?

If your events are already instrumented in a product analytics tool, the analysis takes 15-30 minutes and zero SQL. Each tool calls it something slightly different:

Mixpanel: Reports → Retention. Set "first event" to signup, "return event" to your active action. Choose monthly intervals. Mixpanel offers three retention models -- N-day, unbounded, and bracket -- which produce different curves. Use unbounded unless you have a specific reason not to.
Amplitude: Analyses → Retention. Same logic. Amplitude lets you split by user property (plan, country, channel) on the same chart -- useful for spotting which segment is driving the average.
PostHog: Product Analytics → Retention. Open-source and free up to 1M events/mo. PostHog's cohort builder lets you save a behavioral cohort and reuse it across feature flags and experiments.

All three calculate the same retention matrix shown in the spreadsheet method above. The cost is lock-in: if you change tools, you re-instrument. The benefit is real-time updates and clickable drill-down into individual users in any cell of the cohort table.

What does a healthy retention curve look like?

A healthy retention curve drops in the first 1-3 periods (the natural shakeout of bad-fit users), then flattens to a stable plateau or curves upward. The flattening is the signal you are looking for.

Andrew Chen, General Partner at a16z, listed "cohort retention curves that flatten" as his #1 magic metric for product-market fit. His logic: a flat line means a real subset of users have integrated your product into their workflow and stopped churning. Everything above that flat line is recoverable; everything below it is gone.

Four curve shapes you will actually see:

Shape	What it looks like	What it means
Cliff drop	100% → 20% by M2, then near-zero	Onboarding broken. Users sign up, do not get value, leave. Fix activation before anything else.
Slow decay	Steady 5-10% drop every month, no plateau	No product-market fit yet. The product works for nobody specific. Narrow the ICP.
Flattening	Drops 30-50% in M1-M3, then plateaus	Product-market fit with a defined core segment. Pour fuel on that segment's acquisition.
Smiling	Drops, plateaus, then curves upward	Best case. Existing users expand usage over time -- network effects, habit formation, or strong NRR.

A smiling curve is rare and is what consumer-social and viral products look like at their best. Most B2B SaaS targets a flattening curve. See the four retention curve shapes explained in detail for the underlying math.

What Each Retention Curve Shape Tells You About Product-Market Fit

Smiling curve (M3+ recovery)

85% of cohort retained at M6 (illustrative)

Flattening curve (stable plateau)

65% of cohort retained at M6 (illustrative)

Slow decay (gradual decline)

35% of cohort retained at M6 (illustrative)

Cliff drop (M0-M2 churn)

15% of cohort retained at M6 (illustrative)

Source: Andrew Chen, Magic Metrics for Product/Market Fit (2019, restated 2026)

What does a flattening retention curve mean?

A flattening retention curve means a real cohort of users has stopped churning. The visual signature of product-market fit is the slope going to zero above zero. Drop, drop, drop, then flat.

What the flat segment tells you:

You have a core. Some users genuinely need your product. The slope tells you the size of that segment relative to top-of-funnel.
The plateau height is your real retention. If your cohort flattens at 35% by M6, that 35% is your honest long-term retention. Anything above it in earlier months is acquisition tax, not stickiness.
Your ICP lives inside the plateau. Pull the user IDs that survived to the flat part. Look at their plan, channel, company size, use case. That cluster is your ideal customer profile, in evidence.

In 2026, the median B2B SaaS company flattens around 85-90% annual retention; top-quartile holds above 92%. If your curve flattens below 70% annual, you have a leaky bucket; either tighten your ICP or fix the churn driver before raising acquisition spend.

B2B SaaS Annual Retention Benchmarks (2026)

SMB

85%

Mid-Market

90%

Enterprise

93%

Top-Quartile

92%

Source: SaaS Capital + Benchmarkit 2026 Reports

What is the vanity cohorting trap?

Vanity cohorting is grouping users in a way that makes retention look better than it is. The most common version: putting only "activated" users in the denominator, silently dropping every signup who never returned. Your curve looks like product-market fit. It is selection bias.

Three vanity cohort patterns to watch for:

Activated-only cohorts. "Of users who completed onboarding, 80% are still active in M3." True. Also useless -- you have hidden every signup who bounced. The denominator must be all signups, not all activated users.
Survivor-biased cohorts. Reporting only on cohorts that are old enough to have a M12 number. New cohorts that look worse get "too early to tell." If they are getting worse, you need to know now.
Aggregated retention rate. A single "60% retention" headline number with no cohort breakdown. As the Mixpanel cohort guide puts it, looking at total users together hides important patterns -- older customers churning while new signups mask the decline.

The fix is mechanical: every cohort row must include every user who started in that period, even if they never came back. Their absence in M1+ columns is the entire signal.

Which approach should you use? (decision matrix)

Three approaches, three audiences. Pick the cheapest one that produces a curve you trust.

Use this approach	If you...	Setup time
Spreadsheet from CSV	Have <50K events and no analyst	30-60 min
Mixpanel / Amplitude / PostHog	Have events instrumented and a product team	15-30 min
Raw SQL on a warehouse	Have a data warehouse and a custom event schema	2-4 hours first time, 5 min after

The spreadsheet approach is right for founders who want one-time directional truth. Product analytics tools are right when retention is a daily operating metric and non-technical PMs need self-serve access. SQL is right when you need to join cohort behavior with billing, support tickets, or anything else outside the analytics tool.

Do not over-engineer this. A correct cohort table in Google Sheets beats an incorrect one in a $40K/year analytics tool. Get the curve. Read it. Decide what to fix. Then upgrade the pipeline.

Approach	Best for	Setup time	Cost	Limitations
Raw SQL (Postgres / BigQuery)	Engineering teams with a warehouse and event tables	2-4 hours first time	$0 (warehouse cost)	Requires SQL fluency and clean event schema
Mixpanel / Amplitude / PostHog	Product teams with events already instrumented	15-30 minutes	$0 free tier to $2K+/mo	Locked into the tool's cohort logic; export limits
Spreadsheet from CSV export	Founders, marketers, anyone without a data team	30-60 minutes	$0	Manual refresh; breaks past ~50K rows

Frequently asked questions

How do you do a cohort analysis without a data team?

Export a CSV with two columns -- user_id and signup_date -- plus a second CSV of all activity events with user_id and event_date. In Google Sheets or Excel, use a pivot table with signup_month as rows, months_since_signup as columns, and COUNTUNIQUE(user_id) as values. Divide each cell by the cohort's M0 size to get the retention percentage. Total time: about 45 minutes. No SQL, no analyst required.

What is a retention cohort?

A retention cohort is a group of users who share a starting event in the same time window, usually the month they signed up or first paid. You then track what percentage of that exact group is still active in each subsequent period. Cohorts isolate behavior so a strong week of new signups cannot mask older users churning.

What does a healthy retention curve look like?

A healthy curve drops in the first 1-3 periods (the natural shakeout of bad-fit users), then flattens to a stable plateau or curves upward. Andrew Chen calls a flattening curve one of the strongest signals of product-market fit. A curve that keeps declining toward zero means you have not yet built a sticky core, regardless of how strong top-line growth looks.

What does a flattening retention curve mean?

A flattening curve means you have found a segment of users who genuinely need your product and keep coming back. The slope going flat (not zero, just flat) is the visual signature of product-market fit. It tells you to stop changing the core product for those users and start pouring fuel on acquisition for the segment that produced the flattening cohort.

How is cohort analysis different from churn rate?

Churn rate is a single aggregate number across all customers in a period. Cohort analysis splits users by signup date and tracks each group separately over time. A 5% monthly churn rate can hide the fact that recent cohorts churn at 15% while old cohorts stick at 1% -- the average looks fine but the business isdying. Cohorts surface the trend; churn rate hides it.

How many months of data do you need to run a useful cohort analysis?

You need at least 3-6 months of signup data to see the shape of a curve. Six to twelve cohorts is the minimum to spot trend lines across cohorts (whether newer cohorts retain better than older ones). For SaaS contracts, you want at least one full renewal cycle (12-13 months) before drawing strong conclusions.

What is vanity cohorting?

Vanity cohorting is grouping users in a way that makes retention look better than it is, usually by including only active or paying users in the denominator instead of all signups. The classic version: reporting retention only for users who completed onboarding. The signups who never returned get silently dropped, and your curve looks like product-market fit when it is actually selection bias.

Should I use weekly or monthly cohorts?

Use weekly cohorts for consumer products with daily-use intent (apps, social, games) where users either form a habit fast or churn fast. Use monthly cohorts for B2B SaaS, subscription products, and anything with a longer activation cycle. Anchor B2B cohorts on first-payment date, not signup date, because free trial signups inflate cohort size with users who were never in market.

Can I run cohort analysis on Stripe data alone?

Yes, for revenue cohorts. Export the customers and invoices CSVs from Stripe, anchor each customer to their first successful invoice date, and track MRR by month. This gives you Gross Revenue Retention by cohort. You cannot get product engagement cohorts from Stripe -- that requires event data from your app, Mixpanel, Amplitude, or PostHog.

Place near the top of the article and again before the FAQ. The CSV download is the citation hook for AI engines.

Get the free cohort analysis starter pack (CSV + SQL + spreadsheet)