A programmatic SEO database needs one head term, 2-4 modifier dimensions, and 8-15 unique-data fields per row. The head term is the page pattern (e.g., "[App A] + [App B] integration"). Modifiers are the variables that produce one URL per combination. Unique-data fields are everything else: prices, screenshots, code snippets, FAQs, ratings. Without unique-data fields, you ship near-duplicates and Google deindexes you. Below are three real schemas you can copy: integration directory, alternative-to, and role/persona pages, each with Airtable columns and Postgres SQL.
What should a programmatic SEO database include?
A pSEO database needs four layers: an identity layer (head term, modifier columns, slug), a content layer (unique data fields per row), a rendering layer (template flags and conditional blocks), and a freshness layer (timestamps, source URLs, refresh status).
Most pSEO posts say "gather data" and stop there. That advice ships you 5,000 pages with the same three sentences swapped. The reason Zapier's 50,000+ integration pages drive ~2M monthly visits while most pSEO sites get deindexed is the density of unique data per row, not the template.
The four layers in field terms:
| Layer | Example fields | Purpose |
|---|---|---|
| Identity | head_term, modifier_1, modifier_2, slug, canonical_url |
URL generation, dedup |
| Content | description_long, pricing_data, code_snippet, screenshot_url, faq_pairs |
Prevents duplicate content |
| Rendering | template_id, show_pricing_block, has_video, cta_variant |
Per-row conditional logic |
| Freshness | created_at, updated_at, data_source_url, last_verified_at |
AI engine recency signals |
If a row only has the identity layer plus a one-line description, the page will not rank, and increasingly will not get indexed. Ahrefs' analysis shows that 95% of keywords have fewer than 10 monthly searches individually, so each page must be unambiguously the best answer for its specific query, or it gets cut from the index entirely.
What's the difference between a head term, modifier, and derived field?
A head term is the broad category that describes the page pattern. A modifier is a variable that combines with the head term to produce a unique URL. A derived field is computed from head term + modifiers (or from external data) and is never typed by hand.
This distinction matters because it determines what you store versus what you generate. According to UntalkedSEO's modifier framework, modifiers come in two flavors: primary modifiers that create a new category ("onboarding software for SaaS"), and secondary modifiers that further qualify it ("onboarding software for SaaS with Slack integration").
Worked example:
- Head term:
"integration"(page pattern: "[App A] + [App B] integration") - Primary modifier 1:
app_a_slug(e.g., "slack") - Primary modifier 2:
app_b_slug(e.g., "notion") - Secondarymodifier:
use_case(e.g., "task-sync") - Derived field:
h1 = "Connect " + app_a_name + " to " + app_b_name - Derived field:
slug = app_a_slug + "-" + app_b_slug + "-integration" - Derived field:
meta_description= templated fromapp_a_name,app_b_name,use_case
Store the modifiers. Derive everything else at build time. If you store derived fields, you double the maintenance cost when the template changes.
How do I structure data for an integration directory?
An integration directory needs two modifier dimensions (App A x App B), an apps lookup table, and an integrations join table with 8-15 unique-data fields. This is the schema Zapier, Make, and n8n use.
The key insight: you do not store one row per page. You store one row per app, then a join table with one row per app pair. This keeps app metadata DRY and lets you generate App A, App B, and App A + App B pages from the same source.
Airtable export -- apps table:
| Column | Type | Required | Feeds |
|---|---|---|---|
app_slug |
Single-line text (PK) | Yes | URL |
app_name |
Single-line text | Yes | H1, meta |
category |
Single-select | Yes | Faceted nav |
logo_url |
Attachment | Yes | Hero image |
description_short |
Long text (160 char) | Yes | Meta description |
description_long |
Long text (300+ words) | Yes | Body content |
pricing_tiers |
Long text (JSON) | No | Pricing block |
oauth_supported |
Checkbox | Yes | Setup steps |
api_doc_url |
URL | No | Reference link |
popular_use_cases |
Multi-select | Yes | FAQ generation |
Airtable export -- integrations table (the join):
| Column | Type | Required | Feeds |
|---|---|---|---|
integration_id |
Autonumber (PK) | Yes | Internal |
app_a |
Link to apps | Yes | URL slug 1 |
app_b |
Link to apps | Yes | URL slug 2 |
triggers |
Long text (JSON) | Yes | Unique block |
actions |
Long text (JSON) | Yes | Unique block |
template_count |
Number | No | Social proof |
setup_minutes |
Number | Yes | Hero badge |
top_use_case |
Single-select | Yes | H2 narrative |
code_snippet |
Long text | No | Dev block |
screenshot_url |
URL | No | Visual unique data |
monthly_volume |
Number | No | "Popular" sort |
verified_at |
Date | Yes | Freshness signal |
Postgres CREATE TABLE:
CREATE TABLE apps (
app_slug TEXT PRIMARY KEY,
app_name TEXT NOT NULL,
category TEXT NOT NULL,
logo_url TEXT NOT NULL,
description_short VARCHAR(160) NOT NULL,
description_long TEXT NOT NULL,
pricing_tiers JSONB,
oauth_supported BOOLEAN NOT NULL DEFAULT false,
api_doc_url TEXT,
popular_use_cases TEXT[] NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE integrations (
integration_id BIGSERIAL PRIMARY KEY,
app_a TEXT NOT NULL REFERENCES apps(app_slug),
app_b TEXT NOT NULL REFERENCES apps(app_slug),
triggers JSONB NOT NULL,
actions JSONB NOT NULL,
template_count INTEGER DEFAULT 0,
setup_minutes INTEGER NOT NULL,
top_use_case TEXT NOT NULL,
code_snippet TEXT,
screenshot_url TEXT,
monthly_volume INTEGER DEFAULT 0,
verified_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT app_pair_unique UNIQUE (app_a, app_b),
CONSTRAINT no_self_join CHECK (app_a <> app_b)
);
CREATE INDEX idx_integrations_app_a ON integrations(app_a);
CREATE INDEX idx_integrations_app_b ON integrations(app_b);
The app_pair_unique constraint is the single most important line. It is what stops you from generating two pages for slack-notion and notion-slack -- which is how most pSEO directories get hit with duplicate content issues.
How do I structure data for alternative-to pages?
Alternative-to pages need one modifier dimension (the competitor), a comparison join, and roughly 18 columns of unique data per row. The head term is "[Competitor] alternatives" or "[Your product] vs [Competitor]".
The schema mistake most teams make: they store one giant row per competitor with all comparison fields flattened. That works until you want "top 10 [competitor] alternatives" pages and individual "[your product] vs [competitor]" pages from the same data. Split it.
Airtable export -- competitors table:
| Column | Type | Required | Feeds |
|---|---|---|---|
competitor_slug |
Single-line (PK) | Yes | URL |
competitor_name |
Single-line | Yes | H1 |
category |
Single-select | Yes | List page |
pricing_min_usd |
Number | Yes | Price column |
pricing_model |
Single-select | Yes | Comparison row |
target_segment |
Multi-select | Yes | Audience match |
g2_rating |
Number | No | Social proof |
g2_review_count |
Number | No | Social proof |
founded_year |
Number | No | Maturity signal |
headquarters |
Single-line | No | Trust block |
key_features |
Long text (JSON) | Yes | Feature matrix |
weaknesses |
Long text | Yes | Differentiator hook |
best_for |
Long text | Yes | Quick verdict |
Airtable export -- comparisons table (your-product vs competitor):
| Column | Type | Required | Feeds |
|---|---|---|---|
competitor |
Link to competitors | Yes | URL |
feature_diff |
Long text (JSON) | Yes | Comparison table |
pricing_diff_summary |
Long text | Yes | Pricing block |
migration_steps |
Long text | No | How-to block |
customer_quote |
Long text | No | Switcher proof |
verdict |
Long text (50-80 words) | Yes | TL;DR |
Postgres CREATE TABLE:
CREATE TABLE competitors (
competitor_slug TEXT PRIMARY KEY,
competitor_name TEXT NOT NULL,
category TEXT NOT NULL,
pricing_min_usd NUMERIC(10,2) NOT NULL,
pricing_model TEXT NOT NULL CHECK (
pricing_model IN ('per-seat','flat','usage','freemium','enterprise')
),
target_segment TEXT[] NOT NULL,
g2_rating NUMERIC(2,1),
g2_review_count INTEGER,
founded_year INTEGER,
headquarters TEXT,
key_features JSONB NOT NULL,
weaknesses TEXT NOT NULL,
best_for TEXT NOT NULL,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE comparisons (
comparison_id BIGSERIAL PRIMARY KEY,
competitor_slug TEXT NOT NULL REFERENCES competitors(competitor_slug),
feature_diff JSONB NOT NULL,
pricing_diff_summary TEXT NOT NULL,
migration_steps TEXT,
customer_quote TEXT,
verdict TEXT NOT NULL,
verified_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT one_comparison_per_competitor UNIQUE (competitor_slug)
);
The weaknesses and verdict fields are the unique-content engine. Without per-competitor opinion, every alternatives page reads identically and Google merges them into one canonical, dropping the rest from the index.
How do I structure data for role/persona pages?
Role/persona pages need two modifier dimensions (role + use case or industry) and ~15 columns. The head term is "[Product] for [Role]" or "[Tool] for [Industry]". Examples: "Asana for product managers," "HubSpot for healthcare."
This schema is the smallest of the three because each page leans on testimonials, role-specific workflows, and pain-point copy rather than dense comparison data. The risk is the opposite: too few unique fields means every page sounds the same.
Airtable export -- personas table:
| Column | Type | Required | Feeds |
|---|---|---|---|
persona_slug |
Single-line (PK) | Yes | URL |
role_name |
Single-line | Yes | H1 |
industry |
Single-select | No | Secondary modifier |
top_pain_points |
Long text (JSON, 3-5) | Yes | Hero copy |
daily_workflow |
Long text | Yes | Use-case block |
relevant_features |
Multi-select (link to features) | Yes | Feature highlight |
kpi_examples |
Long text | Yes | ROI block |
template_recipe |
Long text (JSON) | Yes | In-product CTA |
quote_customer |
Long text | No | Social proof |
quote_author_role |
Single-line | No | Quote attribution |
case_study_url |
URL | No | Internal link |
tools_replaced |
Multi-select | No | Switcher hook |
avg_team_size |
Number | No | Targeting |
recommended_plan |
Single-select | Yes | Pricing CTA |
Postgres CREATE TABLE:
CREATE TABLE personas (
persona_slug TEXT PRIMARY KEY,
role_name TEXT NOT NULL,
industry TEXT,
top_pain_points JSONB NOT NULL,
daily_workflow TEXT NOT NULL,
relevant_features TEXT[] NOT NULL,
kpi_examples TEXT NOT NULL,
template_recipe JSONB NOT NULL,
quote_customer TEXT,
quote_author_role TEXT,
case_study_url TEXT,
tools_replaced TEXT[],
avg_team_size INTEGER,
recommended_plan TEXT NOT NULL,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT pain_points_min CHECK (
jsonb_array_length(top_pain_points) >= 3
)
);
The pain_points_min check is deliberate. It enforces the rule that no row ships to production without 3+ unique pain points, because a persona page with generic copy is the failure mode that gets entire pSEO directories deindexed.
How many columns do I need per page type?
Use this rule: the more competitive the SERP, the more unique-data columns you need. Integration directories sit in highly templated SERPs and need ~22 columns per row. Alternative-to pages compete with G2 and Capterra and need ~18. Role/persona pages have softer SERPs and survive on ~15.
| Page type | Modifier dimensions | Total columns | Unique-data columns | Example sites |
|---|---|---|---|---|
| Integration directory | 2 (App A x App B) | 22 | 8-12 | Zapier, Make, n8n |
| Alternative-to | 1 (competitor) | 18 | 6-10 | G2, Capterra |
| Role/persona | 2 (role + industry) | 15 | 5-8 | Asana, Notion, HubSpot |
| Location-based | 1-2 (city + service) | 20 | 8-11 | Yelp, TripAdvisor |
| Glossary/term | 1 (term) | 12 | 4-7 | Investopedia, HubSpot |
The unique-data column count is the metric that matters, not the total. Slug, meta, H1, and template flags do not count. Every column that produces a sentence or block on the page that no other row produces -- that is what counts.
Why these numbers? Omnius reports growing organic traffic 850% (102 to 8,500 monthly visits) and signups from 67 to 2,100/month after rebuilding their pSEO data model with denser per-row data. The shift was from a 6-column source-of-truth to a 19-column one. The template did not change.
How do I derive modifier columns from a head term?
Derive modifiers by faceting the head term: pick the head term, list every dimension a buyer would qualify it with, then validate each dimension against real search demand. The five most useful facets are category, audience, use case, geography, and integration partner.
The procedure:
- State the head term as a question. "Best CRM" becomes "Best CRM for [who] in [where] doing [what]."
- Replace each bracket with a candidate facet. For "Best CRM": audience (
for solopreneurs,for real estate), price tier (free,under $50/mo), feature (with email automation,with mobile app), integration (with Gmail,with QuickBooks). - Pull search volume for each facet. Use Ahrefs or DataForSEO to confirm there is demand. Skip facets where every modifier has under 10 monthly searches.
- Cap at 2-4 modifier dimensions. More than 4 dimensions explodes your matrix into pages with no demand and triggers thin-content flags.
- Make modifiers orthogonal. "For startups" and "under 50 employees" are the same facet. Pick one.
Worked example for "CRM for [audience] with [integration]":
- Modifier 1 (
audience): solopreneurs, real-estate, healthcare, agencies, ecommerce (5 values) - Modifier 2 (
integration): Gmail, Outlook, HubSpot, Slack, QuickBooks (5 values) - Total combinations: 25 pages, each with non-zero search volume
Compare to the bad version with 6 dimensions: 5 x 5 x 4 x 3 x 3 x 3 = 2,700 pages, 90% of which have zero demand. Google's John Mueller has been explicit that low-quality auto-generated pages get demoted, and the volume threshold is the cleanest filter.
How do I avoid duplicate content across thousands of generated pages?
You avoid duplicate content by enforcing per-row uniqueness at the data layer, not the template layer. Add database constraints that block rows from shipping unless they have minimum unique content. Three constraints handle 90% of cases.
Google's own guidance is that 25-30% of the web is naturally duplicate, and it does not penalize duplication unless intent is manipulative. The real risk is not penalty -- it is deindexing. When pages are too similar, Google picks one canonical and drops the rest. For pSEO sites this means thousands of pages disappearing from the index.
The three constraints that prevent it:
1. Minimum unique content length. Every row needs at least one long-form field with 200+ words of row-specific content. Enforce in Postgres:
ALTER TABLE integrations
ADD CONSTRAINT min_unique_content
CHECK (
length(triggers::text) + length(actions::text) >= 800
);
2. Required differentiator field. A field that, if missing, makes the page look templated. For comparisons, this is verdict and weaknesses. For personas, top_pain_points and daily_workflow. Mark them NOT NULL.
3. Self-similarity check before publish. Run a content-hash diff between every new row and the 10 most similar existing rows. If cosine similarity is above 0.85, block publish. SEOmatic's guide recommends this as the final gate.
Four additional moves that work in practice:
- Vary the template per category, not per row. 4-5 templates beats 1 template with 100 conditional blocks.
- Pull external unique data (G2 reviews, Reddit threads, GitHub stats, pricing pages). External data is by definition non-duplicated.
- Generate FAQs from row data, not from a fixed list. If two rows would produce identical FAQs, one FAQ pair must change.
- Stagger the index. Practical Programmatic notes Zapier shipped its 50k+ pages over years, not weeks. Crawl budget rewards rows that earn engagement before the next batch ships.
How do I sync the database to my CMS or static site?
Pick a sync engine based on volume: Whalesync or Airtable Automations for under 5,000 rows, a custom build pipeline (Next.js + Postgres + ISR) for 5,000-100,000 rows, and incremental static regeneration with a CDN for above 100,000 rows. The wrong tool at the wrong scale is the most common pSEO failure mode.
The sync stack maps to the database choice:
| Database | Sync engine | CMS/Site | Best for |
|---|---|---|---|
| Airtable | Whalesync | Webflow | <5k rows, no-code |
| Airtable | Make / Zapier | WordPress | <2k rows, blog-first |
| Google Sheets | Sheet2Site, Page Generator Pro | WordPress | <1k rows, MVP |
| Postgres | Custom build (Next.js ISR) | Vercel/Netlify | 5k-1M rows, dev team |
| Postgres | dbt + static export | Cloudflare Pages | 1M+ rows, edge-first |
Two rules that hold across all of these:
- Database is source of truth. The CMS or site is read-only. Never edit content in Webflow or WordPress directly -- the next sync overwrites it.
- Slug column is immutable after publish. Once a row ships, the URL is frozen. AI engines cache citations against URLs. Moving a page costs you every citation it earned. Add
slug_locked_at TIMESTAMPTZand reject updates after that timestamp.
| Schema | Modifier dimensions | Total columns | Unique-data columns | Required schema markup |
|---|---|---|---|---|
| Integration directory | 2 (App A x App B) | 22 | 8-12 | Article + HowTo + SoftwareApplication |
| Alternative-to / vs | 1 (competitor) | 18 | 6-10 | Article + ItemList + Product |
| Role / persona | 2 (role + industry) | 15 | 5-8 | Article + FAQPage |
| Location-based | 1-2 (city + service) | 20 | 8-11 | Article + LocalBusiness |
| Glossary / term | 1 (term) | 12 | 4-7 | Article + DefinedTerm |