faq-heavy 11 min read May 04, 2026

Why 60% of Your pSEO Pages Aren't Indexed (and How to Fix It)

By Peter Foy

Diagnose why your programmatic pages aren't indexed. Triage flowchart, GSC signals to check, and the exact fix for each indexing status. Updated May 2026.

TL;DR

Most pSEO indexing failures fall into five buckets: discovered-not-crawled (a crawl priority problem), crawled-not-indexed (a quality problem), soft 404s (empty templates returning 200), duplicate cluster suppression (templates too similar), and crawl traps from pagination or faceted navigation. Each has a distinct Search Console signal and a specific fix. The Indexing API does not solve any of them.

'Discovered' = crawl priority problem. Fix internal links and sitemap hygiene, not content.
'Crawled, not indexed' = quality problem. Differentiate template body content with unique data.
Soft 404s come from empty datasets returning 200 OK. Return 404/410 or noindex.
The Indexing API only works for JobPosting and BroadcastEvent. Using it for general pSEO violates policy.
Healthy pSEO indexation rate is 60-90% depending on site type. 100% is not the goal.

If 60% of your programmatic pages are missing from Google's index, the cause is almost never a bug. It's one of five specific failure modes: discovered-not-crawled (low crawl priority), crawled-not-indexed (low content quality), soft 404s (empty templates returning 200 OK), duplicate cluster suppression (templates too similar), or crawl traps from pagination and facets. Each one has a distinct signal in Google Search Console and a specific fix. This article maps each status to its diagnosis and remedy, with the triage order to work through them.

Why aren't my programmatic SEO pages getting indexed?

Programmatic pages get rejected from Google's index for one of five reasons: low crawl priority, thin or duplicate content, soft 404s on empty templates, duplicate cluster suppression, or crawl traps that exhaust crawl budget. Each reason maps to a specific Search Console status, and each one has a different fix.

Google confirms this in the Page Indexing report documentation: the report exists precisely because indexing is not binary. A page can be discovered, crawled, deduplicated, soft-404'd, or excludedby canonical, and the response is different in each case.

For large pSEO programs, the math is brutal. According to Indexing Insight's 2026 benchmarks, marketplace and listing sites routinely show indexation coverage below 70%. That isn't a Google malfunction. It's Google being selective at scale, and pSEO sites are selected against more aggressively because their templates look the same to a near-duplicate detector.

The rest of this article walks each GSC status one at a time, with the diagnostic check to run first and the specific fix that matters.

What does 'Discovered -- currently not indexed' mean for pSEO pages?

'Discovered -- currently not indexed' means Google knows the URL exists, usually from your sitemap or an internal link, but has not yet allocated crawl resources to fetch it. The page is sitting in a low-priority crawl queue. Content quality is not the problem yet, because Google has not even seen the content.

Diagnostic check in GSC: Open Pages > Why pages aren't indexed > Discovered -- currently not indexed. Click into the URL list. If you see thousands of URLs from a single template (e.g., /locations/[city]/[service]), that template has a crawl-priority problem.

The fixes that actually work:

Flatten crawl depth. Every pSEO URL should be reachable in 3 clicks or fewer from the homepage. Build hub pages that link to every leaf page in the cluster.
Add internal links from already-indexed pages. Google uses internal PageRank to decide what to crawl next. A pSEO page with 0 internal links has 0 priority.
Trim sitemap bloat. Sitemaps should only contain 200-OK canonical URLs. If your sitemap has 50,000 URLs and 30,000 are low-value, Google will sample, find junk, and deprioritize the cluster.
Stop using 'Request Indexing' for bulk URLs. It has hard quotas and signals nothing about quality.

For sites with 5,000+ programmatic URLs, the single biggest lever is internal linking. See internal linking at scale for programmatic sites for the link-graph patterns that move pages out of Discovered.

Why are my programmatic pages 'Crawled -- currently not indexed'?

'Crawled -- currently not indexed' means Google fetched the page, evaluated the content, and decided it wasn't worth keeping. This is a quality verdict, not a crawl problem. On pSEO sites, the cause is almost always template-stamped pages with insufficient unique content per URL.

Diagnostic check in GSC: In Pages > Crawled -- currently not indexed, sample 20 URLs and run them through the URL Inspection tool. Compare each page's body content side by side. If 90% of the words are identical across templates, you have your answer.

Why this happens at scale. Per Google's Helpful Content guidance, Google's quality systems prioritize content that demonstrably helps a person doing a specific task. A template that swaps in a city name and reuses 800 identical words doesn't pass that bar.

Fixes, in order of impact:

Inject page-unique data. Real numbers, real reviews, real inventory, real local references. Not synonym-spinning.
Cut the boilerplate. If your template's intro paragraph and FAQ are identical on 10,000 pages, those pages look like one page to Google.
Prune zero-demand URLs. Pages targeting queries with no search volume are noise. Noindex them.
Verify intent alignment. A pSEO URL should answer the query in its title. If /best-crm-for-dentists-in-tulsa returns generic CRM advice, Google will refuse to index it.

This is the same quality bar covered in pSEO template structure that passes the helpful-content review.

How do I fix soft 404s on programmatic pages?

A soft 404 happens when a page returns HTTP 200 OK but the content is effectively empty -- no listings, no products, no answers. Google detects this and excludes the URL. Soft 404s are extremely common on pSEO sites because empty database queries still render the template.

Diagnostic check in GSC: Open Pages > Soft 404 in the indexing report. Per the original Google announcement on soft 404 reporting, Google flags these because they waste crawl coverage. Inspect 5-10 URLs and check whether the page is genuinely empty or just thin.

The fix depends on whether the underlying data is empty:

Situation	Correct response
Empty dataset (zero listings, zero products)	Return HTTP 404 or 410
Page moved permanently	301 redirect to the new URL
Page is intentionally thin (sparse city, dead inventory)	`noindex` and remove from sitemap
Page has real content but Google misjudged it	Add unique data, then resubmit

The pSEO-specific gotcha: templates often render a default 'No results found' page with HTTP 200. That is the textbook soft 404. Configure your application to return a real 404 status when the dataset for that URL is empty, not just visually display a 'no results' state.

Do this once at the framework level (Next.js notFound(), Rails render status: :not_found, etc.) and you'll wipe out 80% of pSEO soft 404s in a single deploy.

How does Google's duplicate cluster suppression hurt pSEO?

Google clusters near-duplicate pages and picks one canonical to display, suppressing the rest. On pSEO sites with template-stamped content, this happens silently. Pages don't show up as 'errors' -- they just stop ranking because they got merged into another URL's cluster.

Diagnostic check in GSC: Look for two statuses in Pages > Why pages aren't indexed:

'Duplicate, Google chose a different canonical than user' -- you declared a canonical, Google ignored it.
'Alternate page with proper canonical tag' -- working as intended; ignore unless the canonical is wrong.

Use the URL Inspection tool's 'Google-selected canonical' field to see which URL Google clustered the page with.

How clustering actually works. Per Google's canonical URL documentation, Google evaluates ~40 signals to pick a cluster's canonical. Sitemap inclusion is a weak signal. Internal links and content overlap are stronger. So if your /best-crm-tulsa and /best-crm-oklahoma-city pages share 95% of their text, Google will pick one and bury the other regardless of what your rel=canonical says.

The fixes:

Differentiate body content. Same template, different data. Different reviews. Different local statistics. Different FAQs.
Audit canonical signals. Make sure your rel=canonical, internal links, and sitemap all point at the same URL.
Consolidate clusters that should never have existed. If five pages are competing for the same intent, merge them into one strong page and 301 the rest.

This is also why pSEO sites get flagged by quality reviewers. See will programmatic SEO get penalized in 2026 for the broader risk picture.

Are pagination and faceted navigation killing your crawl budget?

Yes, more often than any other technical cause. Google's faceted navigation guide calls it 'by far the most common source of overcrawl issues site owners report to Google.' On pSEO sites, filter parameters and pagination create infinite URL spaces that consume crawl budget meant for your real landing pages.

Diagnostic check in GSC: In Settings > Crawl stats, sort the 'By URL' breakdown. If thousands of crawled URLs contain query parameters like ?sort=, ?filter=, ?page=4, your facets are eating Googlebot's time.

The math is ugly. A category page with 8 filter dimensions can generate hundreds of thousands of URL combinations. If Googlebot's daily crawl budget for your site is 50,000 fetches and 40,000 of those go to filter combinations, your real pSEO pages get 10,000 -- across millions of URLs.

Fixes that work in 2026:

Use URL fragments (#filter=...) for filters that don't need indexing. Google ignores fragment URLs entirely.
Disallow parameterized URLs in robots.txt. Disallow: /*?filter= and similar.
Return 404 for filter combinations with no results. Per Google's December 2024 crawling guidance, this is the recommended pattern.
For pagination, link ?page=2 through ?page=N from ?page=1 only. Don't expose deep pagination from every page.
Noindex thin pagination pages that just list 10 items with no unique value.

Fix this before you fix anything else. There's no point optimizing your templates if Googlebot never reaches them.

How long should you wait for pSEO pages to index?

For an established domain, expect 3-7 days for the average pSEO page to be indexed once Google has crawled it. Brand-new domains average 18 days. If a page is still 'Discovered' or 'Crawled, not indexed' after 30 days, stop waiting and start fixing.

2026 indexing benchmarks (per Search Engine Journal and CrawlWP):

Established sites: 3.2 days average to first index
Brand-new domain (first page): ~18 days
Ecommerce product pages: 5.7 days
Service pages: 4.1 days

Why pSEO pages skew slower. Google batches large URL pushes through a quality sampling process. If the first 50 URLs from your sitemap underwhelm the quality system, the remaining 49,950 get demoted in priority. So a slow initial roll-out (3-5K pages, monitor, expand) tends to outperform a 50K-page launch on day one.

The 30-day rule. Per Search Engine Land's analysis on why 100% indexing isn't possible, Google explicitly chooses not to index every URL it sees. After 30 days in 'Discovered' or 'Crawled, not indexed', further waiting won't help. Treat the URL as a quality or crawl-priority diagnostic, not a patience problem.

Average Days Until Google Indexes a New Page (2026)

Brand new domain (first page)

18 days

Ecommerce product pages

5.7 days

Service pages

4.1 days

Established site (avg.)

3.2 days

Source: Search Engine Journal / CrawlWP indexing benchmarks, 2026

Should you submit programmatic pages via the Indexing API?

No, unless your pages contain JobPosting or BroadcastEvent structured data. For any other pSEO content, using the Indexing API violates Google's policy and signals your content is ephemeral, which actively hurts evergreen pages.

Google's Indexing API documentation is unambiguous: 'The Indexing API allows site owners to directly notify Google when their job posting or livestreaming video pages are added or removed.' That's the entire approved use case.

What goes wrong when you misuse it:

Submitting a blog post or pSEO landing page tells Google's systems the page is time-sensitive (job, livestream).
Google later notices the page hasn't changed in months. The 'ephemeral' signal contradicts reality.
The page can lose indexing priority entirely, the opposite of what you wanted.
Repeated misuse can result in API access revocation.

What to use instead:

XML sitemap pings -- update the <lastmod> field when content meaningfully changes. Resubmit through GSC.
URL Inspection API for monitoring -- 2,000 queries/day per property, useful for auditing index status across thousands of URLs.
IndexNow (for Bing/Yandex; Google ignores it) -- legitimate for non-Google engines.
Internal linking -- still the highest-leverage way to push Google to crawl new URLs.

There is no shortcut around quality.

What's a healthy indexation rate for a large pSEO site?

A healthy indexation rate ranges from 60% (large marketplace) to 90% (well-differentiated ecommerce or editorial pSEO). 100% is not the target and not achievable above ~10K URLs.

Google's John Mueller has stated repeatedly that Google deliberately chooses not to index everything, and Search Engine Land's analysis on why 100% indexing isn't possible explains the structural reason: Google ranks crawl priority by predicted value, and at scale, predicted value falls off a cliff for templates with overlapping intent.

Benchmark by site type (Indexing Insight, 2026):

Marketplaces and listings: 60-70%
Ecommerce with strong PDPs: 80-90%
Editorial / programmatic content with unique data per URL: 85-95%
Pure boilerplate templates: often <40%

What to do with the unindexed tail.

Sample 50 unindexed URLs. Are they targeting real queries with real intent?
If yes, fix the template (add unique data, fix internal linking).
If no, noindex and remove from sitemap. Concentrate Google's quality budget on the URLs that earn it.

Indexation rate is a vanity metric in isolation. Indexation rate of URLs that target real demand is the metric that matters.

Programmatic SEO Indexation Rates by Site Type

Marketplaces / listing sites

70%

Ecommerce sites

90%

Established editorial sites

95%

Source: Indexing Insight, 2026 Index Coverage benchmarks

How should you structure XML sitemaps for programmatic pages?

Group programmatic URLs into multiple sitemaps of 10,000-50,000 URLs each, segmented by template. Include only 200-OK canonical URLs. Update <lastmod> only when content meaningfully changes. Submit via a sitemap index file.

The pSEO sitemap pattern:

/sitemap_index.xml
  /sitemap-locations.xml       (10K city pages)
  /sitemap-comparisons.xml     (5K vs pages)
  /sitemap-integrations.xml    (2K integration pages)
  /sitemap-blog.xml            (editorial)

Why segmentation matters. When indexation tanks, you need to know which template is failing. A single 50K-URL sitemap hides that signal. Segmented sitemaps let GSC's Indexing > Sitemaps report show you indexation rate per template.

Rules that prevent sitemap rot:

Only include URLs that return HTTP 200 and are self-canonical. No redirected URLs. No noindex URLs. No URLs blocked in robots.txt.
Update <lastmod> when the body content actually changes, not on every nightly rebuild. False <lastmod> updates burn Google's trust in your sitemap.
Remove URLs you've noindexed. Don't submit pages you don't want indexed.
Keep sitemaps under 50MB and 50,000 URLs (Google's hard limit).

For sites with active content rotation, see the 13-week pSEO refresh cycle for 10,000+ pages for how to handle <lastmod> honestly at scale.

When should you noindex programmatic pages instead of fixing them?

Noindex pSEO pages when the underlying dataset is genuinely thin, when search demand is zero, or when the page can't be meaningfully differentiated from a sibling. Don't try to fix every page. Pruning is faster and lifts the rest of the site.

The clear-cut noindex cases:

Cities with fewer than 5 listings/businesses behind them
Products or services with no inventory
Integration pages for deprecated apps
Long-tail pages targeting queries with literally zero monthly searches
Pagination pages 5+ deep with no unique content
Filter and sort variations

Why noindexing helps the rest of the site. Google's crawl and quality systems sample your URL space. Pages that fail quality drag down the predicted value for the entire template. Removing them concentrates signal on the URLs that do convert demand into rankings.

The pruning workflow:

Pull 'Crawled -- currently not indexed' and 'Discovered -- currently not indexed' lists from GSC.
Cross-reference with backend data (listing count, review count, search volume).
Apply noindex, follow to anything in the bottom quartile.
Remove those URLs from the sitemap.
Wait 4-6 weeks and recheck the indexation rate of the remaining URLs.

Most pSEO sites we audit see indexation lift on retained pages within one crawl cycle of pruning the bottom 20%.

What's the triage flowchart for pSEO indexing failures?

Run these steps in order. Don't skip ahead. Each step's fix is wasted if the previous step's problem is still active.

Step 1. Confirm the URL is reachable.

HTTP 200? Not blocked in robots.txt? No noindex tag? Self-canonical?
If no: fix the technical block first. Stop here until clean.

Step 2. Check the GSC indexing status.

Use URL Inspection. Note the exact status string.
If 'Alternate page with proper canonical tag' or 'Page with redirect' -- working as intended. Stop.

Step 3. If 'Discovered -- currently not indexed':

Check internal links pointing to the URL. <3 internal links = crawl-priority problem.
Check crawl depth from homepage. >3 clicks = flatten architecture.
Check sitemap composition. Bloated sitemap = trim to 200-OK canonicals only.
Don't proceed to step 4 yet. Wait 14 days for Google to recrawl.

Step 4. If 'Crawled -- currently not indexed':

Compare body content to 5 sibling pages in the same template. >80% overlap = quality problem.
Inject unique data. Real numbers. Real reviews. Real local context.
Cut boilerplate intros and FAQs that repeat across the cluster.

Step 5. If 'Soft 404':

Is the underlying dataset empty? Return HTTP 404 or 410.
Is the page genuinely thin but should exist? noindex it.
Is the page misjudged? Add unique content, resubmit.

Step 6. If 'Duplicate, Google chose different canonical':

Check Google's chosen canonical in URL Inspection.
Differentiate body content or consolidate the cluster.
Audit rel=canonical, internal links, and sitemap for conflicting signals.

Step 7. If crawl budget is the bottleneck:

Audit Settings > Crawl stats for parameter URL bloat.
Disallow facets, fragment-ize sort/filter, 404 empty filter combos.

Step 8. After all fixes, wait 14-30 days and re-audit.

Don't request indexing. Don't ping the Indexing API. Let crawl normalize.
Re-pull the GSC reports. The status mix should shift toward 'Indexed.'

This is the order. Skipping steps wastes the next fix.

GSC Status	What Google Did	Root Cause on pSEO	First Fix to Try
Discovered -- currently not indexed	Saw the URL, didn't crawl it	Low PageRank to the URL, deep crawl path, sitemap bloat	Flatten architecture, add internal links from indexed pages
Crawled -- currently not indexed	Crawled the page, refused to index	Thin or template-stamped content; near-duplicate of existing index entry	Differentiate the body content with unique data per page
Soft 404	Crawled, judged the page empty	Empty data slots (zero results, dead inventory) returning 200 OK	Return 404/410 for empty datasets, or noindex
Duplicate, Google chose different canonical	Clustered the page with another URL	90%+ template overlap with the chosen canonical	Add unique content blocks; check rel=canonical isn't pointing wrong
Alternate page with proper canonical	Honored your canonical	Working as intended (not an error)	Ignore unless canonical target is wrong
Page with redirect	Followed the redirect	Working as intended	Ignore

Frequently asked questions

Why are my programmatic SEO pages crawled but not indexed?

Google has fetched your pages and judged the content not worth keeping in the index. On pSEO sites, this is almost always a quality signal: template-stamped pages with thin or near-duplicate body content, missing unique data, or no demand for the underlying query. Differentiate the body with page-unique facts and prune pages with no real intent.

What does 'Discovered -- currently not indexed' mean?

Google found the URL (usually via your sitemap or an internal link) but has not allocated crawl resources to fetch it yet. Per Google's Page Indexing report docs, the page is sitting in a low-priority queue. The fix is internal linking and crawl-budget hygiene, not a 'request indexing' click.

How do I fix soft 404s on programmatic pages?

First confirm the HTTP status is 200 OK on a near-empty page. Then either (a) return a real 404 or 410 when the underlying dataset is empty, (b) add enough unique value to justify the 200, or (c) noindex the page. Google's soft 404 guidance is explicit that empty templates with 200 status waste crawl coverage.

How long should I wait for pSEO pages to index?

Established sites average 3.2 days to first index, brand-new domains average 18 days, and ecommerce product pages average 5.7 days according to 2026 benchmarks. If a page has been in 'Discovered' or 'Crawled' status for more than 30 days, treat it as a quality or crawl-priority problem, not a waiting problem.

Should I submit programmatic pages via the Indexing API?

No, unless your pages have JobPosting or BroadcastEvent structured data. Google's Indexing API documentation explicitly restricts the API to those use cases, and submitting general content signals the page is ephemeral, which can actively hurt evergreen pSEO pages.

What's the difference between 'crawled not indexed' and 'discovered not indexed'?

'Discovered' means Google knows the URL exists but hasn't crawled it (a crawl-priority problem). 'Crawled' means Google fetched the page and chose not to index it (a quality problem). The fixes are completely different: discovered needs better internal links and sitemap hygiene; crawled needs better content.

Can the Indexing API force my programmatic pages into Google's index?

No. Even when used correctly, the API only places URLs into a priority crawl queue. Google still decides independently whether to index each page based on quality signals. Submitting non-job, non-livestream URLs is also a violation of Google's spam policies.

How do I detect duplicate cluster suppression on pSEO pages?

In Search Console, look for 'Duplicate, Google chose a different canonical than user' and 'Alternate page with proper canonical tag.' Use the URL Inspection tool to see which URL Google picked as the cluster canonical. If it's not the URL you wanted ranked, your templates are too similar.

What indexation rate should a large pSEO site expect?

Marketplace and listing sites typically index 60-70% of monitored URLs, while editorial and ecommerce sites with strong page-level differentiation routinely hit 85-95%. Anything below 50% is a quality alarm, not a Google bug. 100% indexation is not the goal and not achievable at scale.

Do faceted navigation and pagination cause pSEO indexing failures?

Yes. Google's faceted-navigation guidance flags it as the most common source of overcrawl complaints. Filter combinations create infinite URL spaces that consume crawl budget and starve your real pSEO pages. Block parameterized URLs in robots.txt or use URL fragments for filters.

Should I noindex pSEO pages with thin data instead of fixing them?

Yes, when the underlying dataset is genuinely thin (cities with no listings, products with no reviews, integrations for deprecated apps). Noindexing low-value pages concentrates Google's crawl and quality signals on your high-value templates and lifts the indexation rate of the rest of the site.

Will requesting indexing in Search Console help my programmatic pages?

It helps for one or two pages, not at scale. Search Console's 'Request Indexing' button has tight quotas and is meant for individual URLs. For pSEO, fix the underlying crawl-priority or quality issue and let Google's normal crawl pick the pages up, which it will within days once signals improve.

Run the indexation triage on your own site before Google's quality reviewers do.

Audit your pSEO indexation in 5 minutes