Most B2B marketers can't prove AEO is working because 70.6% of AI-driven traffic lands as "Direct" in Google Analytics, per the Loamly 2026 AI Traffic Benchmark. The fix is a three-layer measurement stack: a GA4 referrer regex for clicked AI traffic, a self-reported attribution field for dark AI traffic, and a weekly prompt-tracking sweep for share of voice. This FAQ answers the 18 questions B2B teams keep asking about AEO measurement, with the regex, the dashboard, and the CMO report format inline.

Why doesn't AI traffic show up properly in Google Analytics?

AI tools strip the referrer header before users land on your site. Per the Loamly 2026 AI Traffic Benchmark, 70.6% of AI traffic arrives without a referrer and gets bucketed as 'Direct' in GA4. ChatGPT Atlas opens links in an internal sandbox that strips referrers entirely. Mobile in-app browsers do the same.

Three mechanisms cause the loss:

  1. Copy-paste behavior. Users read an AI answer, then type or paste the URL into a fresh tab. No referrer.
  2. Sandboxed link handlers. ChatGPT Atlas, Perplexity Comet, and most mobile AI apps proxy clicks through internal browsers that drop referrers.
  3. HTTPS-to-HTTP downgrade rules. Some legacy sites still trip referrer policy edge cases.

The consequence is severe. SparkToro and Datos flagged the same dark-traffic pattern in 2024 for social, and AI has now amplified it. Loamly found that dark AI traffic converts at 10.21% versus 2.46% for non-AI traffic -- 4.1x higher. The traffic that matters most is the traffic GA4 hides.

Fix it with three layers: a custom channel group regex (covered below), self-reported attribution at signup, and a CRM first-touch tag that persists through the deal cycle.

Why You Can't See AI Traffic: Conversion Rate Gap (2026)
Dark AI Traffic (hidden as 'Direct')
10.21%
Traditional Organic Search
2.46%
Reported AI Referral Traffic
3.49%
Source: Loamly State of AI Traffic 2026 Benchmark Report

What is share of voice in AI search and how is it calculated?

Share of voice in AI search is the percentage of all brand mentions across a defined prompt library that belong to your brand. Formula: (your brand mentions / total brand mentions across tracked prompts) x 100. The math is identical to traditional SOV; the data source is different.

Run the calculation across a fixed prompt library (typically 25-50 prompts), capture every brand named in the AI's answer, then aggregate. Per Otterly's methodology docs, the right unit of measurement is mention count weighted by prompt, not raw mention count -- otherwise high-frequency category prompts skew the number.

A worked example. You track 50 B2B CRM prompts. Across them, AI engines name 312 brands total. Your brand is mentioned 47 times. Share of voice = 47 / 312 = 15.1%.

Two refinements matter for B2B:

  • Segment by prompt intent. Branded SOV (prompts that include your name) is almost always 70%+. Category SOV (prompts that don't include any brand) is the real signal. The Foundation x AirOps study of 57.2M citations found brands held 77.6% of branded-prompt citations but just 2.2% of category-prompt citations.
  • Segment by engine. SOV on ChatGPT looks nothing like SOV on Perplexity. Always report both an aggregate and per-engine breakdown.

How often should I run a prompt-tracking sweep?

Weekly is the industry standard for B2B. Run a daily sweep on 5-10 high-volatility commercial-intent prompts (the ones tied to active opportunities) and a weekly sweep on the rest of your 25-50 prompt library. Monthly cadence misses inflection points; daily-on-everything produces noise without signal.

A practical four-tier setup:

Tier Frequency Volume Purpose
Anomaly Daily 5-10 prompts Catch sudden citation drops on top commercial queries
Trend Weekly Full 25-50 prompt library Standard reporting input
Refresh Bi-weekly Competitor-named prompts Track competitor moves
Audit Quarterly Full library + new variants Re-baseline and prune dead prompts

Why weekly works: AI engines update their citation pools every 3-5 business days for Perplexity and Gemini, and 7-14 days for ChatGPT. A weekly sweep catches movement; daily creates dashboards full of noise.

Why not monthly: per SE Ranking's 2026 prompt-tracking guide, monthly cadence misses 60-70% of meaningful citation shifts. By the time you see a drop, it's been live for 3-4 weeks.

What's the right reporting cadence for AEO to a CMO or board?

Monthly for the CMO dashboard, quarterly for the board. Different audiences, different metrics, different formats.

Monthly CMO report (4 metrics, one page):

  1. Citation rate (overall + per-engine), trended 6 months
  2. Share of voice vs top 3 competitors, trended 6 months
  3. AI referral traffic from GA4 + dark AI from self-reported attribution
  4. AI-sourced pipeline contribution ($ value)

Quarterly board update (3 numbers, one slide):

  1. AI-sourced pipeline as a percentage of total pipeline
  2. Share of voice vs top 3 competitors (single number, trended)
  3. AI traffic conversion rate vs organic conversion rate

The failure mode is sending the CMO 40 metrics. Per Databox's 2026 CMO dashboard analysis, executive dashboards that exceed 8 KPIs get ignored within two months. Roll the eight AEO metrics into three indices the board can absorb in 30 seconds: Visibility (citation rate + SOV), Influence (sentiment + recommendation rate), Outcomes (referral traffic + dark AI + pipeline).

One nonnegotiable: every CMO chart includes competitor share of voice on the same axis. CMOs do not want absolute numbers, they want relative position.

How do you A/B test an AEO change when there's no SERP rank?

Use a citation-rate delta test. Baseline citation rate before the change, ship the change, then resweep on days 5, 14, and 28. The day-14 read is the minimum reliable signal because AI engines need 3-14 days to refresh citation pools depending on the engine.

The protocol:

  1. Pick the prompt set. 20-30 prompts your page targets. Branded + category mix.
  2. Baseline. Three sweeps, two days apart, before the change. Average them.
  3. Ship one variable at a time. Schema markup, FAQ block, statistic addition, expert quote -- not all four at once.
  4. Resweep at days 5, 14, 28. Day 5 catches Perplexity and Gemini. Day 14 catches ChatGPT. Day 28 confirms persistence.
  5. Read the delta. Citation-rate change of less than 5 percentage points on a 30-prompt set is noise. 10+ points is real.

What to test first, ranked by Princeton's GEO research effect sizes: expert quote insertion (+41% citation likelihood), statistic insertion (+30%), inline citation insertion (+30%), then schema markup (boosts Top-3 citation rate from 28% to 47% per Conductor's AEO benchmarks).

Avoid the urge to test five things at once. AEO A/B testing is closer to clinical trial design than CRO -- one variable, longer windows, larger samples.

What is the GA4 referrer regex for ChatGPT, Perplexity, and Gemini?

This regex captures the 12 AI engines that matter for B2B referral tracking in May 2026:

^.*(chatgpt\.com|chat\.openai\.com|gemini\.google\.com|bard\.google\.com|perplexity\.ai|copilot\.microsoft\.com|claude\.ai|you\.com|poe\.com|deepseek\.com|mistral\.ai|phind\.com).*

Drop it into a GA4 custom channel group:

  1. Admin > Data display > Channel groups > Create new
  2. Name: 'AI Referral'
  3. Add channel > Source matches regex > paste the pattern
  4. Drag the new channel above the default 'Referral' channel, otherwise GA4 routes the traffic to generic Referral first
  5. Save. Allow 24-48 hours for backfill

GA4's regex matching is case-sensitive, so chatgpt.com matches but ChatGPT.com does not. The leading ^.* and trailing .* handle subdomain and path variations.

For a full Looker Studio dashboard pre-wired to this channel group, the Vision Labs free template and the Hack the Algo template both work out of the box.

This regex catches roughly 29.4% of AI traffic. The other 70.6% is dark and needs self-reported attribution to surface.

How many prompts should I track in my prompt library?

Start with 25-30 prompts. Cap at 50 unless you have automation tooling. Metricus tested 182 LLM prompts across B2B SaaS categories and found accuracy plateaus around the 30-prompt mark because most B2B categories have a finite set of truly distinct buyer queries. Past 35, you produce variations not new intent.

The right composition for a 30-prompt B2B library:

  • 6 branded prompts (your name + comparison: 'is [you] better than [competitor]', 'reviews of [you]')
  • 8 category prompts ('best CRM for early-stage SaaS', 'top sales engagement tools')
  • 8 comparison prompts ('[you] vs [competitor 1]', '[competitor 1] vs [competitor 2]')
  • 8 use-case / problem prompts ('how to reduce sales cycle for B2B SaaS', 'tool to automate outbound sequences')

Avoid prompt bloat. Tools like Profound and Otterly let you upload 250+ prompts but the marginal insight from prompt 51 to prompt 250 is near-zero for most B2B categories. The exception is multi-product companies (HubSpot, Salesforce) where each product unit needs its own 25-30 prompts.

Refresh the library quarterly. Buyer language shifts and your prompt library decays without active pruning.

What is a good citation rate benchmark for B2B SaaS?

Median citation rate across 2,014 tracked B2B SaaS companies is 0.69% per The Digital Bloom's February 2026 report -- a number that is essentially 'cited only on branded prompts.' Top quartile brands hit 25-40% on category prompts. The gap is enormous and almost entirely structural.

What the benchmark distribution actually looks like:

Percentile Citation rate (category prompts) What it means
Median (50th) 0.7% Cited only when buyers name you directly
75th 8-12% Showing up on a handful of category prompts
90th 25-40% Mentioned in most relevant category prompts
99th 60%+ Wikipedia / Stripe / HubSpot tier

The critical context from the Foundation x AirOps study of 57.2M citations: only 10.15% of citations link to brand-owned domains. When an AI cites you, it's usually citing a Reddit thread, a G2 review, a competitor comparison post, or a third-party listicle that mentions your name -- not your own website.

The implication for measurement: 'citation rate' must capture brand mentions in answer text, not just outbound links to your domain. If you only count link citations, you'll under-measure visibility by roughly 10x.

How do I capture dark AI traffic that shows as 'Direct' in GA4?

Add a 'How did you hear about us?' field to your signup, demo request, and onboarding flows with explicit options for ChatGPT, Perplexity, Gemini, Claude, and Copilot. This is the only reliable way to surface the 70.6% of AI traffic that GA4 hides as Direct.

The spec:

  • Field placement: Step 2 of signup, after email but before payment. Demand Gen friction is minimal; data quality is high.
  • Field type: Required dropdown with 'Other (please specify)' open text.
  • Options: Search engine (Google), ChatGPT, Perplexity, Gemini, Claude, Copilot, LinkedIn, X/Twitter, Friend or colleague, Podcast, Webinar, Other.
  • Storage: Send to your CRM as a custom field, then to your data warehouse. Stamp the value on the account, not just the contact.

The payoff is substantial. Per Outbrain's self-reported attribution research and Ruler Analytics' 2026 study, companies that added this field saw a 67% increase in AI-attributed signups within 90 days. One vendor reported AI conversions were 15x understated in click-based attribution alone.

Validation: pair self-reported data with your GA4 AI channel group. If self-reported AI signups significantly exceed GA4 AI sessions, you're catching the dark traffic. If they match, your GA4 setup is doing the work.

What's the difference between share of voice and share of model?

Share of voice averages your mention rate across all AI engines combined. Share of model breaks it out per engine. A brand can have 40% share on ChatGPT and 5% on Perplexity, which signals very different optimization work.

Why you need both:

  • Share of voice (SOV) is the executive number. Trend it monthly. Compare to top 3 competitors. Aggregate across ChatGPT, Perplexity, Gemini, Claude, Copilot.
  • Share of model (SOM) is the operator number. Reveals where you're winning and where you're invisible. Drives optimization decisions.

A real pattern: B2B SaaS brands often dominate ChatGPT (which weights Wikipedia and authoritative listicles) but underperform on Perplexity (which weights Reddit at 46.7% per Perplexity's own citation analysis). If your share of model on Perplexity is under 5% and your category buyers use Perplexity, the optimization work is Reddit presence, not blog content.

The useful chart: a small-multiples grid showing SOM trended over 12 weeks for each engine, with you and your top 3 competitors as colored lines. Anyone glancing at the chart can see which engines you're winning, losing, or stable on -- and which competitors are accelerating where.

How do I prove AEO ROI when the metric is 'mentioned in 12 of 50 prompts'?

Pair citation rate with three downstream metrics that close the causal loop. Citation rate alone is a vanity metric to a CMO. Citation rate plus traffic plus pipeline is a story.

The four-step proof chain:

  1. Citation rate climbs (input): from 12/50 to 22/50 over 90 days.
  2. AI referral traffic climbs (early outcome): GA4 AI channel group sessions up 60% over the same window.
  3. Branded search climbs (lagging signal): Google Search Console branded impressions up 25-40% within 7-21 days of citation lift -- this is the cleanest causal proof since AI mentions seed downstream branded searches.
  4. Pipeline climbs (revenue outcome): self-reported AI attribution shows AI-sourced opportunities up 30%+ in the next quarter.

The specific framing for a skeptical CMO: 'We went from 12 of 50 to 22 of 50 prompts. Branded search is up 33% in the 21 days following each citation lift. AI-sourced pipeline is up 41% quarter-over-quarter.' That sentence is defensible. 'Citation rate is up' alone is not.

One caveat. Branded search lift is real but often gets miscredited to 'brand awareness work' or PR. Tag every citation lift event in your analytics so the CMO can see the temporal sequence: AEO ship -> citation rate up -> branded search up -> pipeline up.

What KPIs should sit on an AEO dashboard?

Eight metrics, rolled into three executive indices. More than eight and the dashboard gets ignored. Fewer and you can't diagnose what's working.

The eight (full list with definitions in the comparison table above):

  1. Citation rate -- foundational visibility metric
  2. Share of voice -- relative position vs competitors
  3. Share of model -- per-engine breakdown
  4. AI referral traffic -- clicked AI traffic from GA4
  5. Dark AI traffic -- self-reported AI conversions
  6. AI-sourced pipeline -- CRM-tagged revenue
  7. Sentiment score -- positive/neutral/negative tone of mentions
  8. Recommendation rate -- % of category prompts where AI explicitly recommends you

The three executive indices:

  • Visibility Index = citation rate + share of voice (are we showing up?)
  • Influence Index = sentiment + recommendation rate (when we show up, are we positioned well?)
  • Outcomes Index = AI referral + dark AI + AI-sourced pipeline (does it convert?)

This structure mirrors Foundation Inc's three-pillar GEO framework: visibility, citation, sentiment. We add the outcomes layer because B2B CFOs eventually want a revenue line.

The anti-pattern: putting 30 metrics on a dashboard because the AEO tool exposes them. Tools are not dashboards. Pick the eight, ignore the rest.

Do I need a paid AI visibility tool or can I track manually?

Manual tracking works for fewer than 20 prompts on 1-2 engines. Past that, use a paid tool. The break-even is roughly 80 prompt-engine combinations per month. Below that, a smart analyst with a spreadsheet wins on cost. Above that, a person burns out and data quality collapses.

The build-vs-buy decision tree:

Scale Approach Cost
<20 prompts, 1-2 engines Manual + Google Sheets Analyst time only
20-50 prompts, 4+ engines Paid tool, single seat $200-800/month
50-150 prompts, 4+ engines Paid tool + competitor tracking $800-3,000/month
Multi-product, multi-region Enterprise platform + custom integrations $3,000-15,000/month

The tools worth evaluating in May 2026: Profound, Otterly, Peec AI, Conductor, Lumar, and HubSpot's AEO Grader (free for HubSpot customers). All sweep ChatGPT, Perplexity, Gemini, and Claude on a configurable cadence.

Don't build your own LLM-querying pipeline unless you have specific compliance needs. The combinatorics get ugly fast: 50 prompts x 4 engines x 30 days = 6,000 API calls per month, plus prompt versioning, answer parsing, brand-mention extraction, sentiment scoring, and competitor disambiguation. Buy.

How do I attribute pipeline revenue to AI search?

Three-layer attribution stack. None of the layers work alone.

Layer 1: GA4 channel group (clicked AI traffic). Catches the ~29% of AI traffic that arrives with a referrer. Wire UTMs into your CRM via auto-tagging so the channel persists from session to lead to deal.

Layer 2: Self-reported attribution (dark AI). Catches the 70%+ that GA4 misses. The 'How did you hear?' field at signup is the highest-fidelity AI attribution signal you have. Stamp the answer on the account in the CRM.

Layer 3: First-touch CRM tag at the account level. B2B buying groups average 6-10 stakeholders per Forrester's 2026 B2B AI search analysis. The lead who converts is rarely the lead who first heard about you on ChatGPT. Tag the account, not the lead. Use the earliest AI touch from anyone at the account as the first-touch attribution.

The reporting: monthly CMO chart shows AI-sourced pipeline as a stacked bar against other sources, trended 12 months. Quarterly board chart shows AI-sourced pipeline as a percentage of total pipeline, with the trend line called out.

What to ignore: multi-touch attribution models that try to weight AI touches across the full journey. The data is too sparse and the inference too noisy. Stick with first-touch at account level until your AI volume is 20%+ of pipeline, then revisit.

How fast do AI engines update their citation pools after I publish a page?

3-5 business days for Perplexity and Gemini, 7-14 days for ChatGPT, longer for Claude. Plan A/B test reads at day 14 minimum. Don't panic-edit at day 3 if a citation hasn't appeared.

The per-engine cadence:

Engine First-citation window Why
Perplexity 1-5 days Live-web-first architecture, weights freshness aggressively
Gemini 3-7 days Real-time Google index integration
ChatGPT (web) 7-14 days Live web search layer over slower-updating model
ChatGPT (no web) 30+ days or training-cycle Model knowledge cutoff dependent
Claude 7-21 days Slower indexing in current Sonnet/Opus deployment

Perplexity weights freshness so aggressively that content under 30 days old gets 3.2x more citations than older content per Perplexity's published citation patterns. This is why a 13-week refresh cycle on core AEO pages dramatically outperforms set-and-forget publishing.

The practical implication: when you ship a major page change, expect Perplexity citations to move first (week 1), Gemini in week 1-2, ChatGPT-with-web in week 2, ChatGPT-without-web only after the next training cycle. Read your A/B test at day 14 to capture the live-web engines, then again at day 60 if you want training-cycle engines included.

How do I monitor competitor AI visibility?

Add 5-7 competitor names to your prompt library and capture mention counts on every sweep. All major AI visibility platforms (Profound, Otterly, Peec, Lumar) handle this natively -- you tag the competitor list once and SOV breaks out automatically.

What to track per competitor:

  1. Share of voice (their mentions / total mentions) trended weekly
  2. Share of model (their SOV broken out per AI engine)
  3. Co-mention rate (% of your prompts where they appear alongside you)
  4. Recommendation rate (% of prompts where AI explicitly recommends them over you)

The single most useful chart: a stacked area chart of share of voice over time across you and your top 3 competitors, segmented by AI engine. Patterns jump out instantly. A competitor accelerating on Perplexity but flat on ChatGPT is investing in Reddit. A competitor accelerating on ChatGPT but flat on Perplexity is investing in Wikipedia and listicle placements.

Use the data offensively. If a competitor's SOV is up 12 points in 90 days, sweep their published content over the same window. Where did they show up? What did they ship? AEO is a copyable game -- competitive intelligence inside AI engines is more legible than inside Google because the answer text shows you exactly which third-party sources the engine trusts.

What signals do AI engines prioritize when choosing citations?

Three content signals and three structural signals carry the most weight. Per Princeton's GEO research paper, the content lifts are well-quantified.

Content signals:

  • Expert quotes: +41% citation likelihood. Named experts with credentials, in-line quoted with attribution.
  • Statistics: +30% citation likelihood. Specific numbers with sources and years (e.g. 'per Loamly's 2026 benchmark, 70.6% of...').
  • Inline citations: +30% citation likelihood. Hyperlinks to primary sourcesdirectly in body text.

Structural signals:

  • Schema markup: pages with FAQPage + Article + ItemList schema hit 47% Top-3 citation rates vs 28% without per Conductor's 2026 AEO benchmarks.
  • Recency: 50% of all AI citations come from content published in the last 13 weeks. Refresh dates and re-publish dates matter.
  • Question-shaped H2s: AI engines weight headers heavily for extraction. Match the way users phrase queries.

What doesn't move the needle: cute headlines, branded jargon, intro paragraphs over 100 words, image-only data without text equivalents, and 'comprehensive guides' with no clear extractable answer in the first 50 words.

The operating implication for measurement: when citation rate is flat, audit content signals first (do pages have expert quotes, statistics, inline citations?) before audit structure (schema, recency, headers). The content lifts compound; structural lifts gate.

Should I report AEO results in absolute citation counts or percentages?

Report both, on the same chart. Absolute counts ground the conversation in reality. Percentages trend cleanly. Reporting one without the other invites misinterpretation.

The canonical CMO chart for citation rate:

  • Bar component: absolute citation count this month ('Mentioned in 22 of 50 prompts')
  • Line component: citation rate percentage trended 12 months ('44% citation rate, up from 24% in May 2025')
  • Reference lines: top 3 competitors' citation rates trended on the same axis

The failure mode in either direction:

  • Percentage-only: a CMO sees '24% citation rate' with no idea whether that's 12 of 50 prompts or 240 of 1,000 prompts. The denominator changes the credibility of the number.
  • Count-only: '22 prompts mentioned us' sounds great until the CMO learns the prompt library has 50 entries, of which 30 are branded prompts where you'd expect 100% mention rate.

Always disclose the prompt library size and intent breakdown when reporting. 'Citation rate of 44% across 50 prompts (10 branded, 40 category) means we appear in 22 prompts: all 10 branded plus 12 of 40 category.' That sentence is defensible at every level of the org. Anything less precise gets challenged within one quarter.

MetricWhat it measuresHow to calculateReporting cadence
Citation Rate% of tracked prompts where you appear in the AI answer(Prompts citing you / Total tracked prompts) x 100Weekly
Share of VoiceYour mention frequency vs all competitors(Your mentions / Total brand mentions) x 100Weekly
Share of ModelVisibility variance across ChatGPT, Perplexity, Gemini, ClaudeCitation rate broken out by engineBi-weekly
AI Referral TrafficSessions GA4 tags as ChatGPT/Perplexity/Gemini referralsCustom channel group regex match in GA4Weekly
Dark AI TrafficConversions where 'How did you hear?' = AI toolSelf-reported attribution form on signupMonthly
AI-Sourced PipelineMQLs/SQLs/revenue tied to AI-attributed accountsCRM tag + first-touch reportingMonthly
Sentiment ScorePositive/neutral/negative tone of your AI mentionsNLP scoring across captured answer textMonthly
Recommendation Rate% of category prompts where AI explicitly recommends you(Prompts with explicit reco / Category prompts) x 100Monthly