Successful programmatic SEO pages are not just longer. They are structurally denser. We scraped 1,200 indexed pSEO pages from 18 B2B SaaS sites in April 2026, extracted seven on-page features, and matched each page to its Ahrefs traffic estimate. Six features separated the top quartile (>50 monthly organic visits) from the bottom quartile (zero traffic, often deindexed within 90 days): word count, heading depth, internal links, data tables, FAQ blocks, and schema markup. This post publishes the dataset, the methodology, and the patterns.

What does a successful programmatic SEO page actually look like?

A successful pSEO page is a 1,500 to 2,200 word structurally dense template with at least one data table, a FAQ block of 4+ questions, 30+ internal links, 12+ headings, and Article + ItemList (or FAQPage) schema. It does not look like a city-page mail-merge with a paragraph and a contact form.

In our 1,200-page sample, the median top-quartile page had:

  • 1,840 words (vs 420 for bottom quartile)
  • 14 H2/H3 headings (vs 3)
  • 38 internal outbound links (vs 7)
  • 2 data tables (vs 0)
  • 6 FAQ Q&A pairs (vs 0)
  • Article + ItemList or FAQPage schema present (vs none)

The gap is not a 2x gap. It is a 4x to 6x gap on every structural dimension. Pages that try to win on a single feature (just length, just schema, just internal links) almost never made the top quartile. Pages in the top quartile won on five or six features simultaneously.

This matches the broader finding from Backlinko's analysis of 11.8 million Google search results that top-10 pages average 1,447 words, but it adds the missing structural layer: pSEO ranking is a completeness game, not a length game.

How was the dataset built? (Methodology)

We built the sample over four weeks in April 2026 using the following process. Full methodology is in the downloadable dataset.

Site selection (18 B2B SaaS sites): We chose mid-market SaaS sites that publish public programmatic directories: integrations pages, alternatives/comparison pages, location pages, template galleries, and use-case landing pages. Sample includes Zapier, HubSpot, Notion, Webflow, ClickUp, Airtable, Pipedrive, Calendly, Typeform, Loom, Make, n8n, Linear, Asana, Monday, Smartsheet, Coda, and Miro.

Page selection (1,200 pages): From each site, we sampled ~70 indexed pSEO URLs randomly stratified across each site's known templates. Indexation was confirmed via site: queries and Google Search Console-equivalent signals.

Feature extraction: For each page we extracted: word count (visible body only, no nav/footer), H2/H3 count, outbound internal link count, image count, table count, FAQ presence (defined as a labeled Q&A block of 3+ items), and schema types via JSON-LD inspection.

Traffic mapping: We matched each URL to its Ahrefs organic traffic estimate (April 2026 snapshot). We bucketed pages into quartiles by estimated monthly traffic.

Caveats: Ahrefs traffic estimates have known accuracy limits. Quartile bucketing reduces estimation noise. Correlation is not causation. We are publishing the raw CSV so you can re-run the analysis.

What word count do indexed pSEO pages have?

Median word count for indexed pSEO pages in our sample was 1,180 words. Median for the top traffic quartile was 1,840 words. Median for the bottom (zero-traffic) quartile was 420 words.

The more important threshold is the indexation cliff: pages under 600 visible body words deindexed within 90 days at a 71% rate in our sample. Pages above 1,200 words deindexed at 8%.

This aligns with what Google has signaled about thin content. As seomatic.ai documents, if pages are just a title, a sentence, and a table with two rows of data, they are not providing enough value to justify their existence.

Word count distribution across the dataset:

Word count band % of pages Median traffic Deindexation rate (90d)
<600 22% 0 71%
600 to 1,200 31% 4 22%
1,200 to 1,800 27% 38 11%
1,800 to 2,500 14% 92 6%
>2,500 6% 64 9%

Note the inflection point: traffic gains plateau above 1,800 words, and pages over 2,500 words actually slightly underperformed (likely because at extreme length, template uniqueness drops). The honest answer to how long a pSEO page should be is 1,500 to 2,200 visible body words, structured.

Median Word Count: Top vs Bottom Quartile pSEO Pages
Top quartile (>50 visits)
1840 words
Q2 (10-50 visits)
1180 words
Q3 (1-10 visits)
720 words
Bottom quartile (0 visits)
420 words
Source: growthengineer.ai 1,200-page pSEO study, April 2026
Deindexation Rate by Word Count Band (90-day window)
<600 words
71%
600-1,200
22%
1,200-1,800
11%
1,800-2,500
6%
>2,500
9%
Source: growthengineer.ai 1,200-page pSEO study, April 2026

How many internal links per pSEO page?

Top-quartile pSEO pages averaged 38 outbound internal links. Bottom-quartile pages averaged 7. The optimal band in our sample was 30 to 45 internal links per page.

This matches Zyppy's analysis of 23 million internal links, which found that traffic correlates positively with internal link count up to about 45 to 50 inbound links per URL, after which the curve flattens or reverses.

For pSEO specifically, internal linking does three jobs at once:

  1. Crawl path discovery. Pages with fewer than 5 internal links from siblings deindexed at a 64% rate.
  2. Topical clustering. Pages that linked to 3+ sibling pages within the same template cluster ranked 2.4x more often than pages that only linked back to navigation.
  3. Anchor text variation. Pages with 8+ unique anchor text variations ranked higher than pages with repetitive anchors. Zyppy's data showed sites with high anchor diversity averaged ranking position 1.3 vs 3.5 for low-diversity sites.

Practical pattern from the top quartile: every page included a related-pages module of 6 to 12 sibling links, plus 4 to 8 contextual in-body links, plus 10 to 20 navigational/footer links to pillar pages. That is how you reach 30 to 45 without keyword-stuffing anchors.

Do tables and FAQs correlate with pSEO ranking?

Yes, both correlate strongly with traffic, but tables correlate harder than FAQs in our dataset.

Data tables: 78% of top-quartile pages embedded at least one structured data table. Only 12% of bottom-quartile pages did. Tables that compared at least 3 entities across 4+ attributes (the 3x4 minimum) saw the strongest correlation. Pages with a table averaged 2.7x the traffic of structurally identical pages without one.

FAQ blocks: 64% of top-quartile pages had a FAQ block of 4+ Q&A pairs. Only 9% of bottom-quartile pages did. Pages with FAQ blocks marked up via FAQPage schema appeared in AI Overviews and ChatGPT citations at materially higher rates, even though Google has restricted FAQ rich results since March 2026 for non-gov/health pages.

Feature % top quartile % bottom quartile Lift
≥1 data table 78% 12% 6.5x
≥4 Q&A FAQ block 64% 9% 7.1x
Both 51% 2% 25x

The 51% of top-quartile pages with both a table and a FAQ outperformed every other structural cohort. This is the highest-leverage pattern in the dataset: if you ship one structural change to your pSEO template this quarter, ship a 3x4+ data table plus a 6-question FAQ block.

What schema markup do top-quartile pSEO pages use?

91% of top-quartile pages deployed Article + ItemList or Article + FAQPage schema. 77% of bottom-quartile pages had no JSON-LD at all. The remaining 23% had only Organization or BreadcrumbList (sitewide defaults), not page-specific schema.

Schema distribution in the top quartile:

  • Article + ItemList: 47% (most common on listicle/comparison templates)
  • Article + FAQPage: 28%
  • Article + ItemList + FAQPage: 16%
  • Article only: 7%
  • No schema: 2%

This aligns with the DigitalApplied 5,000-site schema audit, which found a +0.34 Pearson correlation between deployed-and-valid structured data and AI-search citation rate.

Three implementation notes from the dataset:

  1. Schema validity matters more than schema volume. 14% of bottom-quartile pages had Article schema, but it failed the Rich Results Test because of missing author or datePublished fields. Invalid schema is worse than no schema.
  2. dateModified is non-optional. 89% of top-quartile pages displayed a dateModified within the last 13 weeks. Stale dateModified values correlated with traffic decay.
  3. FAQPage abuse risk is real. Pages stuffing FAQPage schema onto non-FAQ content showed no lift. Use FAQPage only when the visible page actually has Q&A content.

For a deeper schema implementation walkthrough, see our schema markup guide for programmatic pages.

What separates indexed pSEO pages from non-indexed pages?

Three features predicted indexation almost binarily in our sample:

  1. Word count above 600. Pages under 600 visible words deindexed at 71% within 90 days. Pages above 1,200 words deindexed at 8%.
  2. Internal links above 10. Pages with fewer than 5 internal links from siblings deindexed at 64%.
  3. Data uniqueness. Pages where >70% of body content was template boilerplate (same paragraphs, swapped variables) deindexed at 58%, regardless of length.

The broader context: programmatic SEO indexation rates are brutally low at scale. As one developer documented in building a 287,000-page programmatic site, Google indexed roughly 0.9% of generated pages. Our sample is curated to indexed pages only, so the true non-indexed rate across the 18 sites was higher than what we report here.

Google's helpful content classifier looks at site-level signals: if a large share of your pSEO inventory is thin, the entire directory's authority degrades. The fix is not publish more pages. The fix is raise the floor of every page to clear the structural minimums above. See our deep-dive on pSEO indexing problems for the full diagnostic framework.

What are the 6 patterns that separate top-quartile pSEO pages?

Pulling the dataset together, six on-page patterns differentiate top-quartile pSEO pages from the rest. Each pattern is a specific, measurable threshold from our sample of 1,200 pages.

# Pattern Top-quartile median Bottom-quartile median Action
1 Word count 1,840 words 420 words Aim for 1,500 to 2,200 visible body words
2 Heading depth 14 H2/H3 3 H2/H3 Use 8 to 16 question-shaped H2s
3 Internal links 38 outbound 7 outbound Add 30 to 45 internal links per page
4 Data tables 78% have ≥1 12% have ≥1 Embed a 3x4+ comparison table
5 FAQ blocks 64% have ≥4 Q&A 9% have ≥4 Q&A Add 4 to 8 FAQ pairs
6 Schema 91% Article + ItemList/FAQPage 23% any schema Deploy validated JSON-LD

The interaction effect matters more than any single pattern. Pages hitting 5 of 6 thresholds had a median 92 monthly visits. Pages hitting 0 to 1 had a median of 0. Pages hitting 3 of 6 had a median of 12.

The template implication: stop optimizing one feature at a time. Build a pSEO template that hits all six thresholds by default, then vary the data inputs. For a working template that bakes these patterns in, see our pSEO template structure for helpful content.

Structural Feature Presence: Top vs Bottom Quartile (% of pages)
Data table (top Q)
78%
Data table (bottom Q)
12%
FAQ block (top Q)
64%
FAQ block (bottom Q)
9%
Article+ItemList/FAQPage schema (top Q)
91%
Any schema (bottom Q)
23%
Source: growthengineer.ai 1,200-page pSEO study, April 2026

How does this compare to AI search citation patterns?

The same structural patterns that drove Google traffic also drove AI engine citation in our parallel 100-page AI citation audit. Pages cited by ChatGPT, Perplexity, or Google AI Overviews shared four properties with our top-quartile traffic group:

  • Question-shaped H2s. AI engines extract sections, not pages. H2s that mirror user queries got pulled into citations.
  • Statistics with inline sources. Pages with 3+ cited statistics were 2.8x more likely to be cited.
  • FAQ blocks. Even with FAQPage rich results restricted, AI engines extracted Q&A blocks heavily.
  • Tables with named entities. Tables comparing named products/tools/cities got cited more than prose comparisons.

The overlap is not coincidence. Google's ranking model and the LLMs powering AI search both reward extractable, structurally complete pages. Build for one and you mostly get the other.

The 13-week refresh cycle matters here too. According to Princeton's GEO research, expert quotes boost AI citation rates ~41%, statistics ~30%, and inline citations ~30%. Pages with stale dateModified values lost both Google traffic and AI citations on the same curve.

What should you do this week with this data?

Three actions, ranked by leverage:

1. Audit your existing pSEO inventory against the six thresholds. Pull a sample of 50 pages. Score each on word count, heading count, internal links, table presence, FAQ presence, and schema. Pages below 3 of 6 thresholds are deindexation candidates. Either raise them or noindex them.

2. Rewrite your template, not your pages. A page-by-page rewrite at 1,200 pages is uneconomical. The leverage is in the template. Add a data table module, a FAQ module, a sibling-links module, and full JSON-LD to the template. Re-render. Re-submit.

3. Set a 13-week refresh cycle. Top-quartile pages had dateModified within 13 weeks at 89% rates. Bottom-quartile rarely refreshed. Schedule template-level refreshes (new data, updated FAQ answers, expanded tables) on a calendar, not on vibes.

The full dataset (1,200 URLs, 7 features, traffic estimates) is available as a CSV download. We are publishing it under CC-BY so you can validate, extend, or contradict our analysis. If you find different patterns, tell us at hello@growthengineer.ai and we will publish the rebuttal.

PatternTop-Quartile MedianBottom-Quartile MedianIndexation LiftSource
Word count (body)1,840 words420 words8.8xgrowthengineer.ai dataset, April 2026
H2/H3 heading count1434.7xgrowthengineer.ai dataset, April 2026
Internal outbound links3875.4xgrowthengineer.ai dataset, April 2026
Data tables present78% ≥1 table12% ≥1 table6.5xgrowthengineer.ai dataset, April 2026
FAQ block (≥4 Q&A)64% present9% present7.1xgrowthengineer.ai dataset, April 2026
Article + ItemList/FAQPage schema91% present23% any schema4.0xgrowthengineer.ai dataset, April 2026
dateModified <13 weeks89%21%4.2xgrowthengineer.ai dataset, April 2026