If 60% of your programmatic pages are missing from Google's index, the cause is almost never a bug. It's one of five specific failure modes: discovered-not-crawled (low crawl priority), crawled-not-indexed (low content quality), soft 404s (empty templates returning 200 OK), duplicate cluster suppression (templates too similar), or crawl traps from pagination and facets. Each one has a distinct signal in Google Search Console and a specific fix. This article maps each status to its diagnosis and remedy, with the triage order to work through them.

Why aren't my programmatic SEO pages getting indexed?

Programmatic pages get rejected from Google's index for one of five reasons: low crawl priority, thin or duplicate content, soft 404s on empty templates, duplicate cluster suppression, or crawl traps that exhaust crawl budget. Each reason maps to a specific Search Console status, and each one has a different fix.

Google confirms this in the Page Indexing report documentation: the report exists precisely because indexing is not binary. A page can be discovered, crawled, deduplicated, soft-404'd, or excludedby canonical, and the response is different in each case.

For large pSEO programs, the math is brutal. According to Indexing Insight's 2026 benchmarks, marketplace and listing sites routinely show indexation coverage below 70%. That isn't a Google malfunction. It's Google being selective at scale, and pSEO sites are selected against more aggressively because their templates look the same to a near-duplicate detector.

The rest of this article walks each GSC status one at a time, with the diagnostic check to run first and the specific fix that matters.

What does 'Discovered -- currently not indexed' mean for pSEO pages?

'Discovered -- currently not indexed' means Google knows the URL exists, usually from your sitemap or an internal link, but has not yet allocated crawl resources to fetch it. The page is sitting in a low-priority crawl queue. Content quality is not the problem yet, because Google has not even seen the content.

Diagnostic check in GSC: Open Pages > Why pages aren't indexed > Discovered -- currently not indexed. Click into the URL list. If you see thousands of URLs from a single template (e.g., /locations/[city]/[service]), that template has a crawl-priority problem.

The fixes that actually work:

  1. Flatten crawl depth. Every pSEO URL should be reachable in 3 clicks or fewer from the homepage. Build hub pages that link to every leaf page in the cluster.
  2. Add internal links from already-indexed pages. Google uses internal PageRank to decide what to crawl next. A pSEO page with 0 internal links has 0 priority.
  3. Trim sitemap bloat. Sitemaps should only contain 200-OK canonical URLs. If your sitemap has 50,000 URLs and 30,000 are low-value, Google will sample, find junk, and deprioritize the cluster.
  4. Stop using 'Request Indexing' for bulk URLs. It has hard quotas and signals nothing about quality.

For sites with 5,000+ programmatic URLs, the single biggest lever is internal linking. See internal linking at scale for programmatic sites for the link-graph patterns that move pages out of Discovered.

Why are my programmatic pages 'Crawled -- currently not indexed'?

'Crawled -- currently not indexed' means Google fetched the page, evaluated the content, and decided it wasn't worth keeping. This is a quality verdict, not a crawl problem. On pSEO sites, the cause is almost always template-stamped pages with insufficient unique content per URL.

Diagnostic check in GSC: In Pages > Crawled -- currently not indexed, sample 20 URLs and run them through the URL Inspection tool. Compare each page's body content side by side. If 90% of the words are identical across templates, you have your answer.

Why this happens at scale. Per Google's Helpful Content guidance, Google's quality systems prioritize content that demonstrably helps a person doing a specific task. A template that swaps in a city name and reuses 800 identical words doesn't pass that bar.

Fixes, in order of impact:

  • Inject page-unique data. Real numbers, real reviews, real inventory, real local references. Not synonym-spinning.
  • Cut the boilerplate. If your template's intro paragraph and FAQ are identical on 10,000 pages, those pages look like one page to Google.
  • Prune zero-demand URLs. Pages targeting queries with no search volume are noise. Noindex them.
  • Verify intent alignment. A pSEO URL should answer the query in its title. If /best-crm-for-dentists-in-tulsa returns generic CRM advice, Google will refuse to index it.

This is the same quality bar covered in pSEO template structure that passes the helpful-content review.

How do I fix soft 404s on programmatic pages?

A soft 404 happens when a page returns HTTP 200 OK but the content is effectively empty -- no listings, no products, no answers. Google detects this and excludes the URL. Soft 404s are extremely common on pSEO sites because empty database queries still render the template.

Diagnostic check in GSC: Open Pages > Soft 404 in the indexing report. Per the original Google announcement on soft 404 reporting, Google flags these because they waste crawl coverage. Inspect 5-10 URLs and check whether the page is genuinely empty or just thin.

The fix depends on whether the underlying data is empty:

Situation Correct response
Empty dataset (zero listings, zero products) Return HTTP 404 or 410
Page moved permanently 301 redirect to the new URL
Page is intentionally thin (sparse city, dead inventory) noindex and remove from sitemap
Page has real content but Google misjudged it Add unique data, then resubmit

The pSEO-specific gotcha: templates often render a default 'No results found' page with HTTP 200. That is the textbook soft 404. Configure your application to return a real 404 status when the dataset for that URL is empty, not just visually display a 'no results' state.

Do this once at the framework level (Next.js notFound(), Rails render status: :not_found, etc.) and you'll wipe out 80% of pSEO soft 404s in a single deploy.

How does Google's duplicate cluster suppression hurt pSEO?

Google clusters near-duplicate pages and picks one canonical to display, suppressing the rest. On pSEO sites with template-stamped content, this happens silently. Pages don't show up as 'errors' -- they just stop ranking because they got merged into another URL's cluster.

Diagnostic check in GSC: Look for two statuses in Pages > Why pages aren't indexed:

  • 'Duplicate, Google chose a different canonical than user' -- you declared a canonical, Google ignored it.
  • 'Alternate page with proper canonical tag' -- working as intended; ignore unless the canonical is wrong.

Use the URL Inspection tool's 'Google-selected canonical' field to see which URL Google clustered the page with.

How clustering actually works. Per Google's canonical URL documentation, Google evaluates ~40 signals to pick a cluster's canonical. Sitemap inclusion is a weak signal. Internal links and content overlap are stronger. So if your /best-crm-tulsa and /best-crm-oklahoma-city pages share 95% of their text, Google will pick one and bury the other regardless of what your rel=canonical says.

The fixes:

  1. Differentiate body content. Same template, different data. Different reviews. Different local statistics. Different FAQs.
  2. Audit canonical signals. Make sure your rel=canonical, internal links, and sitemap all point at the same URL.
  3. Consolidate clusters that should never have existed. If five pages are competing for the same intent, merge them into one strong page and 301 the rest.

This is also why pSEO sites get flagged by quality reviewers. See will programmatic SEO get penalized in 2026 for the broader risk picture.

Are pagination and faceted navigation killing your crawl budget?

Yes, more often than any other technical cause. Google's faceted navigation guide calls it 'by far the most common source of overcrawl issues site owners report to Google.' On pSEO sites, filter parameters and pagination create infinite URL spaces that consume crawl budget meant for your real landing pages.

Diagnostic check in GSC: In Settings > Crawl stats, sort the 'By URL' breakdown. If thousands of crawled URLs contain query parameters like ?sort=, ?filter=, ?page=4, your facets are eating Googlebot's time.

The math is ugly. A category page with 8 filter dimensions can generate hundreds of thousands of URL combinations. If Googlebot's daily crawl budget for your site is 50,000 fetches and 40,000 of those go to filter combinations, your real pSEO pages get 10,000 -- across millions of URLs.

Fixes that work in 2026:

  • Use URL fragments (#filter=...) for filters that don't need indexing. Google ignores fragment URLs entirely.
  • Disallow parameterized URLs in robots.txt. Disallow: /*?filter= and similar.
  • Return 404 for filter combinations with no results. Per Google's December 2024 crawling guidance, this is the recommended pattern.
  • For pagination, link ?page=2 through ?page=N from ?page=1 only. Don't expose deep pagination from every page.
  • Noindex thin pagination pages that just list 10 items with no unique value.

Fix this before you fix anything else. There's no point optimizing your templates if Googlebot never reaches them.

How long should you wait for pSEO pages to index?

For an established domain, expect 3-7 days for the average pSEO page to be indexed once Google has crawled it. Brand-new domains average 18 days. If a page is still 'Discovered' or 'Crawled, not indexed' after 30 days, stop waiting and start fixing.

2026 indexing benchmarks (per Search Engine Journal and CrawlWP):

  • Established sites: 3.2 days average to first index
  • Brand-new domain (first page): ~18 days
  • Ecommerce product pages: 5.7 days
  • Service pages: 4.1 days

Why pSEO pages skew slower. Google batches large URL pushes through a quality sampling process. If the first 50 URLs from your sitemap underwhelm the quality system, the remaining 49,950 get demoted in priority. So a slow initial roll-out (3-5K pages, monitor, expand) tends to outperform a 50K-page launch on day one.

The 30-day rule. Per Search Engine Land's analysis on why 100% indexing isn't possible, Google explicitly chooses not to index every URL it sees. After 30 days in 'Discovered' or 'Crawled, not indexed', further waiting won't help. Treat the URL as a quality or crawl-priority diagnostic, not a patience problem.

Average Days Until Google Indexes a New Page (2026)
Brand new domain (first page)
18 days
Ecommerce product pages
5.7 days
Service pages
4.1 days
Established site (avg.)
3.2 days
Source: Search Engine Journal / CrawlWP indexing benchmarks, 2026

Should you submit programmatic pages via the Indexing API?

No, unless your pages contain JobPosting or BroadcastEvent structured data. For any other pSEO content, using the Indexing API violates Google's policy and signals your content is ephemeral, which actively hurts evergreen pages.

Google's Indexing API documentation is unambiguous: 'The Indexing API allows site owners to directly notify Google when their job posting or livestreaming video pages are added or removed.' That's the entire approved use case.

What goes wrong when you misuse it:

  • Submitting a blog post or pSEO landing page tells Google's systems the page is time-sensitive (job, livestream).
  • Google later notices the page hasn't changed in months. The 'ephemeral' signal contradicts reality.
  • The page can lose indexing priority entirely, the opposite of what you wanted.
  • Repeated misuse can result in API access revocation.

What to use instead:

  • XML sitemap pings -- update the <lastmod> field when content meaningfully changes. Resubmit through GSC.
  • URL Inspection API for monitoring -- 2,000 queries/day per property, useful for auditing index status across thousands of URLs.
  • IndexNow (for Bing/Yandex; Google ignores it) -- legitimate for non-Google engines.
  • Internal linking -- still the highest-leverage way to push Google to crawl new URLs.

There is no shortcut around quality.

What's a healthy indexation rate for a large pSEO site?

A healthy indexation rate ranges from 60% (large marketplace) to 90% (well-differentiated ecommerce or editorial pSEO). 100% is not the target and not achievable above ~10K URLs.

Google's John Mueller has stated repeatedly that Google deliberately chooses not to index everything, and Search Engine Land's analysis on why 100% indexing isn't possible explains the structural reason: Google ranks crawl priority by predicted value, and at scale, predicted value falls off a cliff for templates with overlapping intent.

Benchmark by site type (Indexing Insight, 2026):

  • Marketplaces and listings: 60-70%
  • Ecommerce with strong PDPs: 80-90%
  • Editorial / programmatic content with unique data per URL: 85-95%
  • Pure boilerplate templates: often <40%

What to do with the unindexed tail.

  1. Sample 50 unindexed URLs. Are they targeting real queries with real intent?
  2. If yes, fix the template (add unique data, fix internal linking).
  3. If no, noindex and remove from sitemap. Concentrate Google's quality budget on the URLs that earn it.

Indexation rate is a vanity metric in isolation. Indexation rate of URLs that target real demand is the metric that matters.

Programmatic SEO Indexation Rates by Site Type
Marketplaces / listing sites
70%
Ecommerce sites
90%
Established editorial sites
95%
Source: Indexing Insight, 2026 Index Coverage benchmarks

How should you structure XML sitemaps for programmatic pages?

Group programmatic URLs into multiple sitemaps of 10,000-50,000 URLs each, segmented by template. Include only 200-OK canonical URLs. Update <lastmod> only when content meaningfully changes. Submit via a sitemap index file.

The pSEO sitemap pattern:

/sitemap_index.xml
  /sitemap-locations.xml       (10K city pages)
  /sitemap-comparisons.xml     (5K vs pages)
  /sitemap-integrations.xml    (2K integration pages)
  /sitemap-blog.xml            (editorial)

Why segmentation matters. When indexation tanks, you need to know which template is failing. A single 50K-URL sitemap hides that signal. Segmented sitemaps let GSC's Indexing > Sitemaps report show you indexation rate per template.

Rules that prevent sitemap rot:

  • Only include URLs that return HTTP 200 and are self-canonical. No redirected URLs. No noindex URLs. No URLs blocked in robots.txt.
  • Update <lastmod> when the body content actually changes, not on every nightly rebuild. False <lastmod> updates burn Google's trust in your sitemap.
  • Remove URLs you've noindexed. Don't submit pages you don't want indexed.
  • Keep sitemaps under 50MB and 50,000 URLs (Google's hard limit).

For sites with active content rotation, see the 13-week pSEO refresh cycle for 10,000+ pages for how to handle <lastmod> honestly at scale.

When should you noindex programmatic pages instead of fixing them?

Noindex pSEO pages when the underlying dataset is genuinely thin, when search demand is zero, or when the page can't be meaningfully differentiated from a sibling. Don't try to fix every page. Pruning is faster and lifts the rest of the site.

The clear-cut noindex cases:

  • Cities with fewer than 5 listings/businesses behind them
  • Products or services with no inventory
  • Integration pages for deprecated apps
  • Long-tail pages targeting queries with literally zero monthly searches
  • Pagination pages 5+ deep with no unique content
  • Filter and sort variations

Why noindexing helps the rest of the site. Google's crawl and quality systems sample your URL space. Pages that fail quality drag down the predicted value for the entire template. Removing them concentrates signal on the URLs that do convert demand into rankings.

The pruning workflow:

  1. Pull 'Crawled -- currently not indexed' and 'Discovered -- currently not indexed' lists from GSC.
  2. Cross-reference with backend data (listing count, review count, search volume).
  3. Apply noindex, follow to anything in the bottom quartile.
  4. Remove those URLs from the sitemap.
  5. Wait 4-6 weeks and recheck the indexation rate of the remaining URLs.

Most pSEO sites we audit see indexation lift on retained pages within one crawl cycle of pruning the bottom 20%.

What's the triage flowchart for pSEO indexing failures?

Run these steps in order. Don't skip ahead. Each step's fix is wasted if the previous step's problem is still active.

Step 1. Confirm the URL is reachable.

  • HTTP 200? Not blocked in robots.txt? No noindex tag? Self-canonical?
  • If no: fix the technical block first. Stop here until clean.

Step 2. Check the GSC indexing status.

  • Use URL Inspection. Note the exact status string.
  • If 'Alternate page with proper canonical tag' or 'Page with redirect' -- working as intended. Stop.

Step 3. If 'Discovered -- currently not indexed':

  • Check internal links pointing to the URL. <3 internal links = crawl-priority problem.
  • Check crawl depth from homepage. >3 clicks = flatten architecture.
  • Check sitemap composition. Bloated sitemap = trim to 200-OK canonicals only.
  • Don't proceed to step 4 yet. Wait 14 days for Google to recrawl.

Step 4. If 'Crawled -- currently not indexed':

  • Compare body content to 5 sibling pages in the same template. >80% overlap = quality problem.
  • Inject unique data. Real numbers. Real reviews. Real local context.
  • Cut boilerplate intros and FAQs that repeat across the cluster.

Step 5. If 'Soft 404':

  • Is the underlying dataset empty? Return HTTP 404 or 410.
  • Is the page genuinely thin but should exist? noindex it.
  • Is the page misjudged? Add unique content, resubmit.

Step 6. If 'Duplicate, Google chose different canonical':

  • Check Google's chosen canonical in URL Inspection.
  • Differentiate body content or consolidate the cluster.
  • Audit rel=canonical, internal links, and sitemap for conflicting signals.

Step 7. If crawl budget is the bottleneck:

  • Audit Settings > Crawl stats for parameter URL bloat.
  • Disallow facets, fragment-ize sort/filter, 404 empty filter combos.

Step 8. After all fixes, wait 14-30 days and re-audit.

  • Don't request indexing. Don't ping the Indexing API. Let crawl normalize.
  • Re-pull the GSC reports. The status mix should shift toward 'Indexed.'

This is the order. Skipping steps wastes the next fix.

GSC StatusWhat Google DidRoot Cause on pSEOFirst Fix to Try
Discovered -- currently not indexedSaw the URL, didn't crawl itLow PageRank to the URL, deep crawl path, sitemap bloatFlatten architecture, add internal links from indexed pages
Crawled -- currently not indexedCrawled the page, refused to indexThin or template-stamped content; near-duplicate of existing index entryDifferentiate the body content with unique data per page
Soft 404Crawled, judged the page emptyEmpty data slots (zero results, dead inventory) returning 200 OKReturn 404/410 for empty datasets, or noindex
Duplicate, Google chose different canonicalClustered the page with another URL90%+ template overlap with the chosen canonicalAdd unique content blocks; check rel=canonical isn't pointing wrong
Alternate page with proper canonicalHonored your canonicalWorking as intended (not an error)Ignore unless canonical target is wrong
Page with redirectFollowed the redirectWorking as intendedIgnore