CPS® Research Foundation
The Evidence Behind AI Citation Visibility
What this page is
The CPS® (Citation Probability Score) framework measures how likely a page is to be cited by AI systems such as ChatGPT, Perplexity, and Google AI Overviews. This page explains where that model comes from.
It's built from three evidence types:
Research in retrieval and generative systems
Large-scale analyses of AI citation behaviour
Consistent findings across real-world datasets
Where findings are correlational rather than causal, we state that clearly.
Why this matters
Search is shifting from:
- →Ranking pages toward selecting passages
- →Keywords toward extractable answers
- →Authority alone toward authority plus structure plus clarity
If your content can't be cleanly extracted, it's unlikely to be cited, regardless of rankings.
The Five CPS® Pillars
Each CPS® pillar reflects a pattern observed across both academic retrieval research and real-world AI citation data.
How clearly your content is organised for extraction
AI systems don't "read" pages like humans. They parse structure: headings, sections, lists, and semantic layout. Well-structured content is easier to retrieve, segment, and reuse.
The GEO: Generative Engine Optimization study demonstrated that structural optimisation methods significantly improved visibility in AI-generated responses, with uplifts of up to ~40% in controlled benchmarks.
Research from Wix analysing 75,000+ AI answers found that structured formats (listicles, articles, product pages) account for over half of all citations, suggesting format alignment influences citation likelihood.
RAG (Retrieval-Augmented Generation) architectures retrieve passages, not full pages. Content that's clearly segmented is more reliably retrieved.
A controlled study by Ahrefs tracked 1,885 pages adding JSON-LD schema between August 2025 and March 2026, against 4,000 matched control pages. Using difference-in-differences analysis, adding schema produced no statistically significant citation uplift on Google AI Mode (+2.4%) or ChatGPT (+2.2%), and a small but statistically significant decline on Google AI Overviews (−4.6%). The study population was pages already heavily cited by AI. Ahrefs notes schema may still help pages "get crawled, parsed, or indexed in the first place." Schema's primary value appears to be upstream (supporting crawlability, entity recognition, and knowledge graph inclusion) rather than directly driving citation decisions on already-visible pages. Content Structure as a CPS® pillar reflects the broader extractability picture (block size, declarative openings, semantic segmentation) of which schema is one signal among many. See: Ahrefs, "We Tracked 1,885 Pages Adding Schema" (Linehan & Guan, 11 May 2026).
Content that's modular, clearly sectioned, and formatted for scanning is more likely to be cited than dense, unstructured pages.
The concentration of verifiable, attributed information
AI systems prioritise content that's specific, attributable, and evidence-backed.
The GEO study (KDD 2024) found that adding statistics increased AI visibility by ~41%, and adding expert quotations increased visibility by ~28%.
Large-scale analysis by Ahrefs shows that AI-cited content tends to be more structured, factual, and recently updated than traditionally ranked content.
Across LLM and retrieval literature, structured factual information is consistently extracted more reliably than unstructured prose.
Pages that include named data, cite sources, and present concrete claims are more likely to be selected as citation sources.
Whether your content answers the query immediately
AI systems favour content that answers first and elaborates second.
Google Search Central's official AI optimisation guide (15 May 2026) explicitly names query fan-out as the mechanism behind generative AI features in Google Search, defining it as "a set of concurrent, related queries generated by the model to request more information and fetch additional relevant search results to address the user's query." Google's own example: the query "how to fix a lawn that's full of weeds" triggers fan-out queries including "best herbicides for lawns", "remove weeds without chemicals", and "how to prevent weeds in lawn". This is first-party confirmation that a single user query generates multiple concurrent sub-queries. Content that answers only the primary query is invisible to most of the fan-out. The Answer Architecture pillar exists precisely because each retrieved passage must directly answer the implied query it's retrieved against. See: Google Search Central, Optimising for Generative AI Features.
The GEO study (KDD 2024) shows that content structured to directly address queries performs better in AI-generated outputs.
Research on passage retrieval and the "lost in the middle" effect shows that information placed early in a passage is more likely to be used.
Analyses referencing Ahrefs data suggest that early-page content accounts for a disproportionate share of citations, reinforcing the importance of opening clarity.
If the answer is buried mid-paragraph, dependent on surrounding context, or delayed, it's less likely to be retrieved or cited.
Whether each section stands on its own
AI systems retrieve and evaluate content in chunks. Each section must make sense independently, contain a complete idea, and avoid reliance on prior context.
Google Search Central's official AI optimisation guide (15 May 2026) explicitly confirms retrieval-augmented generation (RAG) as the mechanism behind Google's generative AI features, defining it as "a technique (also known as grounding) used to improve the quality, accuracy, and freshness of AI responses by relying on our core Search ranking systems to retrieve relevant, up-to-date web pages from our Search index. Our systems then review the specific information from those retrieved pages to generate a more reliable and helpful response." This is first-party confirmation from Google itself that generation happens by retrieving and synthesising specific information from individual pages, not by ranking whole pages as in traditional search. The Self-Containment pillar exists because each retrieved chunk must make sense in isolation; if it depends on context from elsewhere on the page, the retrieval system can't use it. See: Google Search Central, Optimising for Generative AI Features.
RAG research shows that retrieval occurs at the chunk level, with each passage evaluated independently.
Work on retrieval behaviour (including "lost in the middle" research) demonstrates that clarity and completeness within a passage directly affect usage.
Across GEO implementations, sections that can be read in isolation are more frequently extracted and cited.
A section that starts with "As mentioned above..." is structurally weaker than one that states the idea directly.
How clearly your content signals recency
AI systems show a measurable preference for newer, updated content.
Research from Ahrefs found that AI-cited content is, on average, significantly newer than traditionally ranked content.
Seer Interactive observed that the majority of AI crawler activity targets content published within the past one to two years.
Practitioner analyses consistently show that visible dates, updated statistics, and schema timestamps act as freshness signals.
Freshness isn't just about dates. It requires substantive updates: new data, updated references, and visible recency cues.
Cross-Pillar Evidence: Authority Beyond the Page
AI citation isn't purely page-level. Brand and entity signals matter.
Large-scale correlation study
Ahrefs found that brand mentions across the web correlate more strongly with AI visibility than traditional backlinks.
Platform confirmation
Microsoft has publicly confirmed that structured data (schema markup) helps its systems interpret web content, supporting the role of structured signals in upstream AI processing (crawlability, entity recognition, knowledge graph inclusion). See also: What the evidence says doesn't directly drive citations.
Third-party content corpus drives AI citation (CryptoContent.dev, May 2026)
An independent audit by Wood (CryptoContent.dev, May 2026) scored 50 crypto protocols across ChatGPT, Perplexity, and Google AI Overviews on a 100-point framework covering AI presence, citation quality, website readiness, schema quality, and documentation. Across 1,016 citation records, official protocol pages accounted for just 1% of Perplexity citations and 4% of Google AI Overview citations. The top cited domain on Google AIO was YouTube (111 citations), followed by Reddit (94). On Perplexity, CoinGecko led (9% of citations). The protocols ranking highest for AI visibility weren't those with the strongest technical infrastructure but those with the deepest accumulated third-party coverage. This is direct supporting evidence that AI citation is shaped by the distributed content record surrounding an entity rather than by what the entity publishes about itself. See: Wood, "The 2026 AI Citation Visibility Study for Crypto Protocols" (DOI: 10.5281/zenodo.19253709).
Pages don't exist in isolation. They're evaluated within a broader entity and authority context, and the third-party content corpus surrounding a brand carries disproportionate weight in AI citation decisions compared to what the brand publishes about itself.
What the Evidence Says Doesn't Directly Drive Citations
The most trusted research pages present disconfirming evidence alongside confirming evidence. Here's what current studies suggest does not directly drive AI citation behaviour at the point of retrieval, despite frequent industry claims to the contrary. Four independent sources from different methodologies and different verticals now point the same direction. We include these findings because credibility requires honesty about what doesn't work as advertised, not just what does.
Schema markup on already-cited pages (Ahrefs, May 2026)
A controlled study by Ahrefs (Linehan & Guan, 11 May 2026) tracked 1,885 pages adding JSON-LD schema between August 2025 and March 2026 against 4,000 matched control pages. Using difference-in-differences analysis across four separate tests, adding schema produced no statistically significant citation uplift on Google AI Mode (+2.4%) or ChatGPT (+2.2%), and a small but statistically significant decline on Google AI Overviews (−4.6%). Both treated and control pages were already trending downward on AIO before schema was added, so the small decline can't be cleanly attributed to schema. The headline finding: on pages already in AI systems' consideration set, adding JSON-LD schema doesn't measurably increase citations.
Schema at retrieval time (searchVIU, 2025)
An experiment by searchVIU tested whether five major AI systems (ChatGPT, Claude, Perplexity, Gemini, and Google AI Mode) use schema markup when fetching a page in real-time. None of them did. During direct retrieval, every system extracted only visible HTML content. JSON-LD, hidden Microdata, and hidden RDFa were all ignored. This is mechanistic evidence consistent with the Ahrefs finding: structured data isn't part of the on-the-fly extraction loop for the AI systems tested.
Why correlation studies overstate schema's role
Ahrefs' earlier correlation analysis found AI-cited pages were almost three times more likely to have JSON-LD than non-cited pages. Their controlled follow-up explains why that gap exists without schema causing citations: sites that implement structured data tend to also invest in technical SEO, publish authoritative content, build links, and maintain their pages. Schema rides the same wave as every other quality signal. Strip schema out and the rest of the signal stack likely still carries the page through to citation.
Schema in industry verticals: a 50-protocol crypto audit (Wood, CryptoContent.dev, May 2026)
An independent cross-sectional study by Wood audited 50 crypto protocols across ChatGPT, Perplexity, and Google AI Overviews on a 100-point framework. The findings corroborate the Ahrefs result from a different methodology and a different vertical. Pendle, the only DeFi protocol in the sample with JSON-LD schema, scored zero across all three platforms for AI presence. Starknet had the joint-highest schema quality score (12/15) and ranked 15th overall. Aave had no schema and ranked 2nd. Solana had no schema and ranked 5th. Schema-adopting protocols averaged 7.0 AI mentions versus 4.6 for non-adopters, but the author explicitly attributes that gap to protocol maturity rather than schema effectiveness: the schema-adopting set tends to be the larger, more established protocols with deeper third-party coverage. The author's own honest framing: "The more defensible claim for schema in this context is entity resolution: helping AI systems confirm what a protocol is and how it relates to other entities. It doesn't create the content record that determines whether that page gets cited." See: Wood, "The 2026 AI Citation Visibility Study for Crypto Protocols" (DOI: 10.5281/zenodo.19253709). Cited with attribution by permission.
Google Search Central's own position (15 May 2026)
Google's official AI optimisation guidance, published 15 May 2026, includes a "Mythbusting generative AI search" section that addresses four practices commonly promoted by AEO/GEO tools. On schema: "Structured data isn't required for generative AI search, and there's no special schema.org markup you need to add." On llms.txt and similar files: "You don't need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search." On chunking: "There's no requirement to break your content into tiny pieces for AI to better understand it." On rewriting for AI: "You don't need to write in a specific way just for generative AI search." The guidance also confirms that, from Google's perspective, optimising for generative AI search "is optimising for the search experience, and thus still SEO." Google's guidance covers its own systems only (AI Overviews and AI Mode). The other AI platforms CBA tracks (ChatGPT, Perplexity, Claude, Microsoft Copilot) use different retrieval and citation architectures, which is why our audit covers all five platforms rather than treating them as interchangeable. See: Google Search Central, Optimising for Generative AI Features.
Independent third-party synthesis: Perea Research (May 2026)
The most rigorous independent synthesis of AI citation research published in 2026, drawing on 100+ primary sources including the Princeton/KDD GEO benchmarks, the 5W AI Platform Citation Source Index (680 million tracked citations), and dozens of controlled experiments. The paper validates two specific CBA design choices that were already in place before its publication.
Perea documents distinct retrieval architectures, source preferences, and freshness curves across ChatGPT, Perplexity, Google AI Overviews, Gemini, and Claude, with citation-overlap data from ZipTie showing only 11% of domains cited by ChatGPT are also cited by Perplexity for the same query and 71% of cited sources appearing on a single platform. The paper's explicit conclusion: "Optimization is per-engine, not universal." This is the strongest third-party validation of CBA's per-platform audit structure, which has covered all five engines as separate citation graphs since launch rather than collapsing them into a single AEO/GEO score.
Perea synthesises the seven measured citation factors that move citation rates in production AI engines: answer-first structure, citation and quotation density, schema as machine-readable contract, entity and E-E-A-T, freshness curves, per-engine source mix, and brand or entity search volume. The top three interventions from the underlying Princeton/KDD GEO benchmarks (Cite Sources at +40.6%, Quotation Addition at +35.1%, Statistics Addition at +32.9%) map directly onto the CPS® five pillars: Content Structure, Fact Density, Answer Structure, Self-Containment, and Freshness. Independent confirmation that the citation drivers CPS® scores against are the ones that empirically move retrieval behaviour, derived from a research base CBA had no influence over.
Source: Dante Perea, "GEO/AEO 2026: The Citation Economy and the Discovery Layer of B2A", Perea Research, 6 May 2026. Licensed under CC BY 4.0. Cited with attribution per licence terms.
CPS® has always scored Content Structure on extractability fundamentals (block size in the 134–167 word RAG range, declarative opening sentences, semantic segmentation) rather than treating schema markup as a primary citation driver. Four independent sources now point the same way: Ahrefs' controlled study, searchVIU's mechanistic retrieval test, Wood's 50-protocol crypto audit, and Google Search Central's own published guidance. Schema remains valuable as a crawlability and entity-recognition tool for pages not yet in AI systems' consideration set; we recommend it in our technical readiness layer for clients in that tier. For already-visible pages, the lever isn't schema. It's the broader content record: block-level extractability, fact density, answer architecture, self-containment, freshness, and the accumulated third-party coverage Wood's study highlights as the dominant signal in his sample. Independently, Perea's 2026 synthesis of 100+ primary studies confirms the same picture from a different angle: the citation drivers that move the numbers are content signals, not technical markup, and per-engine variation is structural rather than incidental. For the strategy-level implications, see Schema vs Content Signals: What Actually Drives AI Citations. For the strategic case on why getting your first citation matters disproportionately and how citation share compounds across model retraining cycles, see Your First AI Citation Is Your Most Valuable. For CBA's synthesis of the most statistically robust AI search study published to date (what it confirms about brand mentions, hallucination risk, and the traffic-conversion gap), see What the Ahrefs AI Benchmark Report Means for Your Brand.
What CPS® Does Differently
Most SEO frameworks optimise for rankings and traffic. CPS® is designed for citation probability, passage-level extractability, and AI system behaviour. It translates research into a measurable model across five dimensions.
| Framework | Optimises for |
|---|---|
| Traditional SEO | Rankings · Traffic |
| GEO / AEO tools | Visibility · Mention rate |
| CPS® Framework | Citation probability · Passage-level extractability · AI system behaviour |
Research Note
AI citation behaviour is an emerging field. The evidence on this page reflects peer-reviewed retrieval research, large-scale industry studies, controlled experimental studies, independent cross-sectional audits, first-party platform documentation, and consistent practitioner observations as of May 2026.
Some findings are correlational rather than causal, and platform behaviour may evolve. Google Search Central's official AI optimisation guidance (15 May 2026) is cited both as primary source confirmation for mechanisms underlying Answer Architecture and Self-Containment (RAG and query fan-out), and as disconfirming evidence for the role of schema and llms.txt as citation drivers on Google AI surfaces specifically. Where controlled studies, cross-sectional audits, or official platform guidance disconfirm common industry claims (the Ahrefs schema study, the searchVIU retrieval experiment, Wood's 50-protocol crypto audit, and Google's mythbusting section), we present those findings alongside confirming evidence. This page is reviewed and updated quarterly as new research emerges.
As of June 2026, Google Search Console launched dedicated AI performance reports showing impressions in AI Overviews and AI Mode, covering Google's surfaces only. There is no equivalent reporting tool for ChatGPT, Perplexity, Claude, or Microsoft Copilot. The research on this page is one reason CBA's audit covers all five platforms separately: the citation mechanisms differ, and a tool that measures one platform's impressions can't substitute for a five-platform citation audit. See what the new GSC report shows and what it doesn't →
Work with us
If you want to understand how your content performs against these principles, we can analyse it using the CPS® framework.
Start with one of these:
Get the full CPS® audit on your site
Every page scored at block level. Five-pillar breakdown per paragraph. Prioritised rewrite list. Free audit, results in 48 hours.
Get Your Free Audit →