Home/ Skills/ pseo-page-quality-at-scale

general pseo-page-quality-at-scale

pseo-page-quality-at-scale

This skill should be used when the user asks to "maintain quality in pSEO", "quality control for programmatic pages", "QA for pSEO", "ensure page quality at scale", "pSEO quality assurance", "review programmatic pages", "quality checks for scaled content", "audit pSEO quality", or any variation of maintaining, ensuring, or auditing content quality across programmatic SEO pages at scale.

Download .md

pSEO Page Quality at Scale

Quality at scale is the central challenge of programmatic SEO. Any single pSEO page can be reviewed in 5 minutes. But when you have 200 or 2,000 pages, manual review of every page is impractical. You need a quality system — automated checks, sampling strategies, and feedback loops — that catches problems before they compound across your entire page set.

The quality bar: every pSEO page must be useful enough that a real person searching for that specific query would find it valuable. Pages that exist only to capture a keyword, with no unique value, get deindexed and damage your site.

The pSEO Quality Framework

Layer 1: Automated checks (100% of pages)

Run these on every page before publishing:

Check	What it catches	How to automate	Pass criteria
Word count	Thin pages	Script counts words per page	500+ total words, 300+ unique words
Unique content ratio	Template-heavy pages	Script strips template, counts remaining	> 50% unique content
Pairwise similarity	Near-duplicate pages	Cosine similarity between all page pairs	< 70% similarity between any two pages
Required elements present	Missing sections	Script checks for H2s, tables, FAQ, schema	All required elements present
Data completeness	Pages with missing data	Script checks for null/empty fields	95%+ of fields populated
Broken links	Dead internal links	Link checker	Zero broken links
Schema validation	Invalid structured data	JSON-LD validator	No schema errors
Image/media presence	Pages without visuals	Script checks for img tags	At least 1 image/visual per page
Title tag uniqueness	Duplicate title tags	Script compares all titles	100% unique titles
Meta description uniqueness	Duplicate meta descriptions	Script compares all metas	100% unique descriptions

Layer 2: Sampled human review (20% of pages)

After automated checks pass, humans review a random sample.

Sampling strategy:

Sample type	Coverage	Purpose
Random 20% sample	Picks 1 in 5 pages randomly	Catches quality issues across the set
Edge case review	Pages at data extremes (shortest, longest, most sparse)	Catches issues at the boundaries
Category sample	2-3 pages per category/modifier value	Ensures quality across all segments

Human review scorecard (per page):

Criterion	0 (Fail)	1 (Pass)	2 (Strong)
Useful to the searcher	No unique value	Adequate answer	Best available answer for this query
Factually accurate	Contains errors	Accurate but basic	Accurate with verified data points
Readable and natural	Obviously machine-generated	Reads adequately	Reads like a human-written page
Differentiated from other pages	Near-duplicate of other pages in set	Moderately unique	Clearly distinct content
AEO-ready	No AEO structure	Some AEO elements	Full AEO compliance

Pass threshold: 6+/10 on every sampled page. If > 10% of sampled pages fail, fix the template/prompt and regenerate the entire batch.

Layer 3: Post-publish monitoring (ongoing)

Metric	Tool	Frequency	Alert threshold
Indexation rate	GSC → Coverage	Weekly	Below 80% indexed
"Discovered - not indexed"	GSC → Excluded	Weekly	Growing for pSEO pages
"Crawled - not indexed"	GSC → Excluded	Weekly	More than 10% of set
Organic traffic per page	Analytics	Monthly	Average below 10 visits/month after 3 months
AI citations	AI monitoring tool	Monthly	Below 10% citation rate
Thin content warnings	GSC → Manual Actions	Immediately	Any warning
User engagement (bounce rate, time on page)	Analytics	Monthly	Bounce > 85% or time < 30 seconds

Quality Improvement Loop

When quality issues are detected, follow this loop:

1. Detect issue (automated check, human review, or post-publish signal)
2. Diagnose root cause (data problem, template problem, prompt problem, or content gap)
3. Fix at the system level (not just the individual page)
4. Regenerate affected pages
5. Re-review sample of regenerated pages
6. Republish
7. Monitor for improvement

Always fix at the system level. If one page has thin content, 50 pages probably have the same issue. Fix the template or prompt, not just the individual page.

Common issues and system-level fixes

Issue	Root cause	System-level fix
30% of pages below 300 unique words	Template has too much boilerplate	Reduce shared template text, add per-entry content elements
Pages in the same category are 80% similar	Modifier doesn't create enough differentiation	Add more data fields per entry, generate entry-specific FAQ
Factual errors in 5% of reviewed pages	AI hallucinating data not in the prompt	Add explicit instruction: "Only use data provided. Do not invent facts"
Title tags are near-identical	Title template is too generic	Include modifier values in title template
"Crawled - not indexed" growing	Google considers pages thin	Add unique enrichment: expert takes, comparison context, UGC

Quality Metrics Dashboard

Track these metrics in a single dashboard:

Metric	Target	Status indicator
Automated check pass rate	100%	Green: 100%. Yellow: 95-99%. Red: < 95%
Human review pass rate (sample)	90%+	Green: 90%+. Yellow: 80-90%. Red: < 80%
Indexation rate	90%+	Green: 90%+. Yellow: 70-90%. Red: < 70%
Avg organic traffic per page	50+ visits/month (after 3 months)	Green: 50+. Yellow: 20-50. Red: < 20
Thin content warnings	Zero	Green: 0. Red: any
Similarity max (pairwise)	< 70%	Green: < 70%. Red: > 70%

Pre-Publish Checklist

[ ] All automated checks pass at 100% (word count, uniqueness, similarity, data completeness)
[ ] 20% human review sample completed with 90%+ pass rate
[ ] Edge case pages reviewed (shortest, most sparse data)
[ ] Category sample reviewed (2-3 per category/modifier)
[ ] Schema validated on all pages
[ ] Title tags and meta descriptions are 100% unique across the set
[ ] Post-publish monitoring dashboard configured
[ ] Alert thresholds set (indexation, thin content, quality signals)
[ ] Quality improvement loop documented (who diagnoses, who fixes, when to regenerate)
[ ] First batch size determined for staggered publication

Anti-Pattern Check

No automated quality checks → At 200+ pages, you can't manually review every one. Build automated checks that run on 100% of pages before any human review begins
Reviewing only 5% of pages → 5% sample misses too many issues. Review 20% minimum, with additional edge case and category samples. The investment prevents site-wide quality problems
Fixing individual pages instead of the system → If page #47 has thin content, the fix isn't rewriting page #47. The fix is improving the template or prompt that generated it, then regenerating all affected pages
Publishing without post-publish monitoring → Indexation rate, quality signals, and user engagement must be tracked continuously. Without monitoring, problems compound silently across hundreds of pages
Accepting 60% indexation as "good enough" → 60% indexation means 40% of your pages are invisible. That's 200 wasted pages in a 500-page set. Diagnose the causes and fix until indexation exceeds 90%
No quality feedback loop → Quality is not a one-time gate. It's an ongoing system. When issues are detected post-publish, the fix must flow back to the template, prompt, and data source — not just the affected pages

Want agents that use skill files like this?

We customize skill files for your brand voice and methodology, then run content agents against them.

Book a call

# pSEO Page Quality at Scale

## The pSEO Quality Framework

### Layer 1: Automated checks (100% of pages)

Run these on every page before publishing:

| Check | What it catches | How to automate | Pass criteria |
|-------|----------------|----------------|--------------|
| Word count | Thin pages | Script counts words per page | 500+ total words, 300+ unique words |
| Unique content ratio | Template-heavy pages | Script strips template, counts remaining | > 50% unique content |
| Pairwise similarity | Near-duplicate pages | Cosine similarity between all page pairs | < 70% similarity between any two pages |
| Required elements present | Missing sections | Script checks for H2s, tables, FAQ, schema | All required elements present |
| Data completeness | Pages with missing data | Script checks for null/empty fields | 95%+ of fields populated |
| Broken links | Dead internal links | Link checker | Zero broken links |
| Schema validation | Invalid structured data | JSON-LD validator | No schema errors |
| Image/media presence | Pages without visuals | Script checks for img tags | At least 1 image/visual per page |
| Title tag uniqueness | Duplicate title tags | Script compares all titles | 100% unique titles |
| Meta description uniqueness | Duplicate meta descriptions | Script compares all metas | 100% unique descriptions |

### Layer 2: Sampled human review (20% of pages)

After automated checks pass, humans review a random sample.

**Sampling strategy:**

| Sample type | Coverage | Purpose |
|------------|----------|---------|
| Random 20% sample | Picks 1 in 5 pages randomly | Catches quality issues across the set |
| Edge case review | Pages at data extremes (shortest, longest, most sparse) | Catches issues at the boundaries |
| Category sample | 2-3 pages per category/modifier value | Ensures quality across all segments |

**Human review scorecard (per page):**

| Criterion | 0 (Fail) | 1 (Pass) | 2 (Strong) |
|-----------|----------|----------|------------|
| Useful to the searcher | No unique value | Adequate answer | Best available answer for this query |
| Factually accurate | Contains errors | Accurate but basic | Accurate with verified data points |
| Readable and natural | Obviously machine-generated | Reads adequately | Reads like a human-written page |
| Differentiated from other pages | Near-duplicate of other pages in set | Moderately unique | Clearly distinct content |
| AEO-ready | No AEO structure | Some AEO elements | Full AEO compliance |

**Pass threshold: 6+/10 on every sampled page. If > 10% of sampled pages fail, fix the template/prompt and regenerate the entire batch.**

### Layer 3: Post-publish monitoring (ongoing)

| Metric | Tool | Frequency | Alert threshold |
|--------|------|-----------|----------------|
| Indexation rate | GSC → Coverage | Weekly | Below 80% indexed |
| "Discovered - not indexed" | GSC → Excluded | Weekly | Growing for pSEO pages |
| "Crawled - not indexed" | GSC → Excluded | Weekly | More than 10% of set |
| Organic traffic per page | Analytics | Monthly | Average below 10 visits/month after 3 months |
| AI citations | AI monitoring tool | Monthly | Below 10% citation rate |
| Thin content warnings | GSC → Manual Actions | Immediately | Any warning |
| User engagement (bounce rate, time on page) | Analytics | Monthly | Bounce > 85% or time < 30 seconds |

---

## Quality Improvement Loop

When quality issues are detected, follow this loop:

```
1. Detect issue (automated check, human review, or post-publish signal)
2. Diagnose root cause (data problem, template problem, prompt problem, or content gap)
3. Fix at the system level (not just the individual page)
4. Regenerate affected pages
5. Re-review sample of regenerated pages
6. Republish
7. Monitor for improvement
```

**Always fix at the system level.** If one page has thin content, 50 pages probably have the same issue. Fix the template or prompt, not just the individual page.

### Common issues and system-level fixes

| Issue | Root cause | System-level fix |
|-------|-----------|-----------------|
| 30% of pages below 300 unique words | Template has too much boilerplate | Reduce shared template text, add per-entry content elements |
| Pages in the same category are 80% similar | Modifier doesn't create enough differentiation | Add more data fields per entry, generate entry-specific FAQ |
| Factual errors in 5% of reviewed pages | AI hallucinating data not in the prompt | Add explicit instruction: "Only use data provided. Do not invent facts" |
| Title tags are near-identical | Title template is too generic | Include modifier values in title template |
| "Crawled - not indexed" growing | Google considers pages thin | Add unique enrichment: expert takes, comparison context, UGC |

---

## Quality Metrics Dashboard

Track these metrics in a single dashboard:

| Metric | Target | Status indicator |
|--------|--------|-----------------|
| Automated check pass rate | 100% | Green: 100%. Yellow: 95-99%. Red: < 95% |
| Human review pass rate (sample) | 90%+ | Green: 90%+. Yellow: 80-90%. Red: < 80% |
| Indexation rate | 90%+ | Green: 90%+. Yellow: 70-90%. Red: < 70% |
| Avg organic traffic per page | 50+ visits/month (after 3 months) | Green: 50+. Yellow: 20-50. Red: < 20 |
| Thin content warnings | Zero | Green: 0. Red: any |
| Similarity max (pairwise) | < 70% | Green: < 70%. Red: > 70% |

---

## Pre-Publish Checklist

- [ ] All automated checks pass at 100% (word count, uniqueness, similarity, data completeness)
- [ ] 20% human review sample completed with 90%+ pass rate
- [ ] Edge case pages reviewed (shortest, most sparse data)
- [ ] Category sample reviewed (2-3 per category/modifier)
- [ ] Schema validated on all pages
- [ ] Title tags and meta descriptions are 100% unique across the set
- [ ] Post-publish monitoring dashboard configured
- [ ] Alert thresholds set (indexation, thin content, quality signals)
- [ ] Quality improvement loop documented (who diagnoses, who fixes, when to regenerate)
- [ ] First batch size determined for staggered publication

---

## Anti-Pattern Check

- No automated quality checks → At 200+ pages, you can't manually review every one. Build automated checks that run on 100% of pages before any human review begins
- Reviewing only 5% of pages → 5% sample misses too many issues. Review 20% minimum, with additional edge case and category samples. The investment prevents site-wide quality problems
- Fixing individual pages instead of the system → If page #47 has thin content, the fix isn't rewriting page #47. The fix is improving the template or prompt that generated it, then regenerating all affected pages
- Publishing without post-publish monitoring → Indexation rate, quality signals, and user engagement must be tracked continuously. Without monitoring, problems compound silently across hundreds of pages
- Accepting 60% indexation as "good enough" → 60% indexation means 40% of your pages are invisible. That's 200 wasted pages in a 500-page set. Diagnose the causes and fix until indexation exceeds 90%
- No quality feedback loop → Quality is not a one-time gate. It's an ongoing system. When issues are detected post-publish, the fix must flow back to the template, prompt, and data source — not just the affected pages