What research underlies the CPS framework?

The Citation Probability Score (CPS) framework is built from three evidence types: peer-reviewed research in retrieval and generative systems, large-scale industry analyses of AI citation behaviour, and consistent practitioner findings across real-world datasets. Each of the five CPS pillars reflects patterns observed across both academic retrieval research and real-world AI citation data. Where findings are correlational rather than causal, this is stated clearly in the framework documentation.

What does the GEO KDD 2024 study show about content structure and AI citation?

The GEO: Generative Engine Optimization study, accepted at KDD 2024, demonstrated that structural optimisation methods significantly improved visibility in AI-generated responses, with uplifts of up to approximately 40% in controlled benchmarks. The study also found that adding statistics to content increased AI visibility by approximately 41%, and adding expert quotations increased visibility by approximately 28%. Research from Wix analysing over 75,000 AI answers found that structured formats including listicles, articles, and product pages account for over half of all citations, suggesting format alignment influences citation likelihood.

What is the research caveat on the CPS framework?

AI citation behaviour is an emerging field. The evidence presented in the CPS Research Foundation reflects peer-reviewed retrieval research, large-scale industry studies, and consistent practitioner observations as of April 2026. Some findings are correlational rather than causal, and platform behaviour may evolve. The CPS Research Foundation page is reviewed and updated quarterly as new research emerges.

Research Foundation

CPS Research Foundation

Q: Why does freshness matter for AI citation?

Research from Ahrefs across 17 million citations found that AI-cited content is, on average, significantly newer than traditionally ranked content. Log file analysis by Seer Interactive observed that the majority of AI crawler activity targets content published within the past one to two years. Platform behaviour analyses consistently show that visible dates, updated statistics, and schema timestamps act as freshness signals. Freshness is not just about dates: it requires substantive updates including new data, updated references, and visible recency cues.

Q: How does brand authority affect AI citation beyond the page level?

AI citation is not purely page-level. Ahrefs found in a large-scale correlation study that brand mentions across the web correlate more strongly with AI visibility than traditional backlinks. Microsoft has publicly confirmed that structured data (schema markup) helps its systems interpret web content, supporting the role of structured signals in AI processing. Pages are evaluated within a broader entity and authority context, not in isolation.

The Evidence Behind AI Citation Visibility

Prepared by: Cited By AI® Last updated: April 2026

Version 1.3 | Published 23 April 2026 | Last updated: 18 May 2026 (added Google Search Central as primary source for Answer Architecture and Self-Containment pillars, citing query fan-out and RAG as first-party platform confirmation of mechanisms; previous version 1.2 added Wood/CryptoContent.dev study and Google disconfirming-evidence section) | Source: citedbyai.info AI Visibility Intelligence. Reviewed quarterly as new research emerges.

What this page is

The CPS (Citation Probability Score) framework measures how likely a page is to be cited by AI systems such as ChatGPT, Perplexity, and Google AI Overviews. This page explains where that model comes from.

It's built from three evidence types:

Peer-reviewed

Research in retrieval and generative systems

Industry-scale

Large-scale analyses of AI citation behaviour

Practitioner

Consistent findings across real-world datasets

Where findings are correlational rather than causal, we state that clearly.

Why this matters

Search is shifting from:

→Ranking pages toward selecting passages
→Keywords toward extractable answers
→Authority alone toward authority plus structure plus clarity

If your content can't be cleanly extracted, it's unlikely to be cited, regardless of rankings.

The Five CPS Pillars

Each CPS pillar reflects a pattern observed across both academic retrieval research and real-world AI citation data.

01 Content Structure

How clearly your content is organised for extraction

AI systems don't "read" pages like humans. They parse structure: headings, sections, lists, and semantic layout. Well-structured content is easier to retrieve, segment, and reuse.

Peer-reviewed (experimental)

The GEO: Generative Engine Optimization study demonstrated that structural optimisation methods significantly improved visibility in AI-generated responses, with uplifts of up to ~40% in controlled benchmarks.

Industry-scale (observational)

Research from Wix analysing 75,000+ AI answers found that structured formats (listicles, articles, product pages) account for over half of all citations, suggesting format alignment influences citation likelihood.

Retrieval systems (mechanism)

RAG (Retrieval-Augmented Generation) architectures retrieve passages, not full pages. Content that's clearly segmented is more reliably retrieved.

Disconfirming evidence (controlled study, May 2026)

A controlled study by Ahrefs tracked 1,885 pages adding JSON-LD schema between August 2025 and March 2026, against 4,000 matched control pages. Using difference-in-differences analysis, adding schema produced no statistically significant citation uplift on Google AI Mode (+2.4%) or ChatGPT (+2.2%), and a small but statistically significant decline on Google AI Overviews (−4.6%). The study population was pages already heavily cited by AI. Ahrefs notes schema may still help pages "get crawled, parsed, or indexed in the first place." Schema's primary value appears to be upstream (supporting crawlability, entity recognition, and knowledge graph inclusion) rather than directly driving citation decisions on already-visible pages. Content Structure as a CPS pillar reflects the broader extractability picture (block size, declarative openings, semantic segmentation) of which schema is one signal among many. See: Ahrefs, "We Tracked 1,885 Pages Adding Schema" (Linehan & Guan, 11 May 2026).

What this means

Content that's modular, clearly sectioned, and formatted for scanning is more likely to be cited than dense, unstructured pages.

02 Fact Density

The concentration of verifiable, attributed information

AI systems prioritise content that's specific, attributable, and evidence-backed.

Peer-reviewed (experimental)

The GEO study (KDD 2024) found that adding statistics increased AI visibility by ~41%, and adding expert quotations increased visibility by ~28%.

Industry-scale (observational)

Large-scale analysis by Ahrefs shows that AI-cited content tends to be more structured, factual, and recently updated than traditionally ranked content.

Supporting research (directional)

Across LLM and retrieval literature, structured factual information is consistently extracted more reliably than unstructured prose.

What this means

Pages that include named data, cite sources, and present concrete claims are more likely to be selected as citation sources.

03 Answer Architecture

Whether your content answers the query immediately

AI systems favour content that answers first and elaborates second.

Platform confirmation (first-party, May 2026)

Google Search Central's official AI optimisation guide (15 May 2026) explicitly names query fan-out as the mechanism behind generative AI features in Google Search, defining it as "a set of concurrent, related queries generated by the model to request more information and fetch additional relevant search results to address the user's query." Google's own example: the query "how to fix a lawn that's full of weeds" triggers fan-out queries including "best herbicides for lawns", "remove weeds without chemicals", and "how to prevent weeds in lawn". This is first-party confirmation that a single user query generates multiple concurrent sub-queries. Content that answers only the primary query is invisible to most of the fan-out. The Answer Architecture pillar exists precisely because each retrieved passage must directly answer the implied query it's retrieved against. See: Google Search Central, Optimising for Generative AI Features.

Peer-reviewed (experimental)

The GEO study (KDD 2024) shows that content structured to directly address queries performs better in AI-generated outputs.

Retrieval behaviour (mechanism)

Research on passage retrieval and the "lost in the middle" effect shows that information placed early in a passage is more likely to be used.

Industry observation (directional)

Analyses referencing Ahrefs data suggest that early-page content accounts for a disproportionate share of citations, reinforcing the importance of opening clarity.

What this means

If the answer is buried mid-paragraph, dependent on surrounding context, or delayed, it's less likely to be retrieved or cited.

04 Self-Containment

Whether each section stands on its own

AI systems retrieve and evaluate content in chunks. Each section must make sense independently, contain a complete idea, and avoid reliance on prior context.

Platform confirmation (first-party, May 2026)

Google Search Central's official AI optimisation guide (15 May 2026) explicitly confirms retrieval-augmented generation (RAG) as the mechanism behind Google's generative AI features, defining it as "a technique (also known as grounding) used to improve the quality, accuracy, and freshness of AI responses by relying on our core Search ranking systems to retrieve relevant, up-to-date web pages from our Search index. Our systems then review the specific information from those retrieved pages to generate a more reliable and helpful response." This is first-party confirmation from Google itself that generation happens by retrieving and synthesising specific information from individual pages, not by ranking whole pages as in traditional search. The Self-Containment pillar exists because each retrieved chunk must make sense in isolation; if it depends on context from elsewhere on the page, the retrieval system can't use it. See: Google Search Central, Optimising for Generative AI Features.

Peer-reviewed (mechanism)

RAG research shows that retrieval occurs at the chunk level, with each passage evaluated independently.

Academic findings

Work on retrieval behaviour (including "lost in the middle" research) demonstrates that clarity and completeness within a passage directly affect usage.

Consistent practitioner pattern

Across GEO implementations, sections that can be read in isolation are more frequently extracted and cited.

What this means

A section that starts with "As mentioned above..." is structurally weaker than one that states the idea directly.

05 Freshness

How clearly your content signals recency

AI systems show a measurable preference for newer, updated content.

Large-scale (17M citations)

Research from Ahrefs found that AI-cited content is, on average, significantly newer than traditionally ranked content.

Log file analysis

Seer Interactive observed that the majority of AI crawler activity targets content published within the past one to two years.

Platform behaviour (directional)

Practitioner analyses consistently show that visible dates, updated statistics, and schema timestamps act as freshness signals.

What this means

Freshness isn't just about dates. It requires substantive updates: new data, updated references, and visible recency cues.

Cross-Pillar Evidence: Authority Beyond the Page

AI citation isn't purely page-level. Brand and entity signals matter.

Large-scale correlation study

Ahrefs found that brand mentions across the web correlate more strongly with AI visibility than traditional backlinks.

Platform confirmation

Microsoft has publicly confirmed that structured data (schema markup) helps its systems interpret web content, supporting the role of structured signals in upstream AI processing (crawlability, entity recognition, knowledge graph inclusion). See also: What the evidence says doesn't directly drive citations.

Third-party content corpus drives AI citation (CryptoContent.dev, May 2026)

An independent audit by Wood (CryptoContent.dev, May 2026) scored 50 crypto protocols across ChatGPT, Perplexity, and Google AI Overviews on a 100-point framework covering AI presence, citation quality, website readiness, schema quality, and documentation. Across 1,016 citation records, official protocol pages accounted for just 1% of Perplexity citations and 4% of Google AI Overview citations. The top cited domain on Google AIO was YouTube (111 citations), followed by Reddit (94). On Perplexity, CoinGecko led (9% of citations). The protocols ranking highest for AI visibility weren't those with the strongest technical infrastructure but those with the deepest accumulated third-party coverage. This is direct supporting evidence that AI citation is shaped by the distributed content record surrounding an entity rather than by what the entity publishes about itself. See: Wood, "The 2026 AI Citation Visibility Study for Crypto Protocols" (DOI: 10.5281/zenodo.19253709).

What this means

Pages don't exist in isolation. They're evaluated within a broader entity and authority context, and the third-party content corpus surrounding a brand carries disproportionate weight in AI citation decisions compared to what the brand publishes about itself.

What the Evidence Says Doesn't Directly Drive Citations

The most trusted research pages present disconfirming evidence alongside confirming evidence. Here's what current studies suggest does not directly drive AI citation behaviour at the point of retrieval, despite frequent industry claims to the contrary. Four independent sources from different methodologies and different verticals now point the same direction. We include these findings because credibility requires honesty about what doesn't work as advertised, not just what does.

Schema markup on already-cited pages (Ahrefs, May 2026)

A controlled study by Ahrefs (Linehan & Guan, 11 May 2026) tracked 1,885 pages adding JSON-LD schema between August 2025 and March 2026 against 4,000 matched control pages. Using difference-in-differences analysis across four separate tests, adding schema produced no statistically significant citation uplift on Google AI Mode (+2.4%) or ChatGPT (+2.2%), and a small but statistically significant decline on Google AI Overviews (−4.6%). Both treated and control pages were already trending downward on AIO before schema was added, so the small decline can't be cleanly attributed to schema. The headline finding: on pages already in AI systems' consideration set, adding JSON-LD schema doesn't measurably increase citations.

Schema at retrieval time (searchVIU, 2025)

An experiment by searchVIU tested whether five major AI systems (ChatGPT, Claude, Perplexity, Gemini, and Google AI Mode) use schema markup when fetching a page in real-time. None of them did. During direct retrieval, every system extracted only visible HTML content. JSON-LD, hidden Microdata, and hidden RDFa were all ignored. This is mechanistic evidence consistent with the Ahrefs finding: structured data isn't part of the on-the-fly extraction loop for the AI systems tested.

Why correlation studies overstate schema's role

Ahrefs' earlier correlation analysis found AI-cited pages were almost three times more likely to have JSON-LD than non-cited pages. Their controlled follow-up explains why that gap exists without schema causing citations: sites that implement structured data tend to also invest in technical SEO, publish authoritative content, build links, and maintain their pages. Schema rides the same wave as every other quality signal. Strip schema out and the rest of the signal stack likely still carries the page through to citation.

Schema in industry verticals: a 50-protocol crypto audit (Wood, CryptoContent.dev, May 2026)

An independent cross-sectional study by Wood audited 50 crypto protocols across ChatGPT, Perplexity, and Google AI Overviews on a 100-point framework. The findings corroborate the Ahrefs result from a different methodology and a different vertical. Pendle, the only DeFi protocol in the sample with JSON-LD schema, scored zero across all three platforms for AI presence. Starknet had the joint-highest schema quality score (12/15) and ranked 15th overall. Aave had no schema and ranked 2nd. Solana had no schema and ranked 5th. Schema-adopting protocols averaged 7.0 AI mentions versus 4.6 for non-adopters, but the author explicitly attributes that gap to protocol maturity rather than schema effectiveness: the schema-adopting set tends to be the larger, more established protocols with deeper third-party coverage. The author's own honest framing: "The more defensible claim for schema in this context is entity resolution: helping AI systems confirm what a protocol is and how it relates to other entities. It doesn't create the content record that determines whether that page gets cited." See: Wood, "The 2026 AI Citation Visibility Study for Crypto Protocols" (DOI: 10.5281/zenodo.19253709). Cited with attribution by permission.

Google Search Central's own position (15 May 2026)

Google's official AI optimisation guidance, published 15 May 2026, includes a "Mythbusting generative AI search" section that addresses four practices commonly promoted by AEO/GEO tools. On schema: "Structured data isn't required for generative AI search, and there's no special schema.org markup you need to add." On llms.txt and similar files: "You don't need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search." On chunking: "There's no requirement to break your content into tiny pieces for AI to better understand it." On rewriting for AI: "You don't need to write in a specific way just for generative AI search." The guidance also confirms that, from Google's perspective, optimising for generative AI search "is optimising for the search experience, and thus still SEO." Google's guidance covers its own systems only (AI Overviews and AI Mode). The other AI platforms CBA tracks (ChatGPT, Perplexity, Claude, Microsoft Copilot) use different retrieval and citation architectures, which is why our audit covers all five platforms rather than treating them as interchangeable. See: Google Search Central, Optimising for Generative AI Features.

Independent third-party synthesis: Perea Research (May 2026)

The most rigorous independent synthesis of AI citation research published in 2026, drawing on 100+ primary sources including the Princeton/KDD GEO benchmarks, the 5W AI Platform Citation Source Index (680 million tracked citations), and dozens of controlled experiments. The paper validates two specific CBA design choices that were already in place before its publication.

1. Five engines, five citation logics — validates platform-specific audit scope

Perea documents distinct retrieval architectures, source preferences, and freshness curves across ChatGPT, Perplexity, Google AI Overviews, Gemini, and Claude, with citation-overlap data from ZipTie showing only 11% of domains cited by ChatGPT are also cited by Perplexity for the same query and 71% of cited sources appearing on a single platform. The paper's explicit conclusion: "Optimization is per-engine, not universal." This is the strongest third-party validation of CBA's per-platform audit structure, which has covered all five engines as separate citation graphs since launch rather than collapsing them into a single AEO/GEO score.

2. Citation driver hierarchy — validates CPS pillar structure

Perea synthesises the seven measured citation factors that move citation rates in production AI engines: answer-first structure, citation and quotation density, schema as machine-readable contract, entity and E-E-A-T, freshness curves, per-engine source mix, and brand or entity search volume. The top three interventions from the underlying Princeton/KDD GEO benchmarks (Cite Sources at +40.6%, Quotation Addition at +35.1%, Statistics Addition at +32.9%) map directly onto the CPS five pillars: Content Structure, Fact Density, Answer Structure, Self-Containment, and Freshness. Independent confirmation that the citation drivers CPS scores against are the ones that empirically move retrieval behaviour, derived from a research base CBA had no influence over.

Source: Dante Perea, "GEO/AEO 2026: The Citation Economy and the Discovery Layer of B2A", Perea Research, 6 May 2026. Licensed under CC BY 4.0. Cited with attribution per licence terms.

What this means for CPS

CPS has always scored Content Structure on extractability fundamentals (block size in the 134–167 word RAG range, declarative opening sentences, semantic segmentation) rather than treating schema markup as a primary citation driver. Four independent sources now point the same way: Ahrefs' controlled study, searchVIU's mechanistic retrieval test, Wood's 50-protocol crypto audit, and Google Search Central's own published guidance. Schema remains valuable as a crawlability and entity-recognition tool for pages not yet in AI systems' consideration set; we recommend it in our technical readiness layer for clients in that tier. For already-visible pages, the lever isn't schema. It's the broader content record: block-level extractability, fact density, answer architecture, self-containment, freshness, and the accumulated third-party coverage Wood's study highlights as the dominant signal in his sample. Independently, Perea's 2026 synthesis of 100+ primary studies confirms the same picture from a different angle: the citation drivers that move the numbers are content signals, not technical markup, and per-engine variation is structural rather than incidental. For the strategy-level implications, see Schema vs Content Signals: What Actually Drives AI Citations. For the strategic case on why getting your first citation matters disproportionately and how citation share compounds across model retraining cycles, see Your First AI Citation Is Your Most Valuable. For CBA's synthesis of the most statistically robust AI search study published to date (what it confirms about brand mentions, hallucination risk, and the traffic-conversion gap), see What the Ahrefs AI Benchmark Report Means for Your Brand.

What CPS Does Differently

Most SEO frameworks optimise for rankings and traffic. CPS is designed for citation probability, passage-level extractability, and AI system behaviour. It translates research into a measurable model across five dimensions.

Framework	Optimises for
Traditional SEO	Rankings · Traffic
GEO / AEO tools	Visibility · Mention rate
CPS Framework	Citation probability · Passage-level extractability · AI system behaviour

Research Note

Research caveat

AI citation behaviour is an emerging field. The evidence on this page reflects peer-reviewed retrieval research, large-scale industry studies, controlled experimental studies, independent cross-sectional audits, first-party platform documentation, and consistent practitioner observations as of May 2026.

Some findings are correlational rather than causal, and platform behaviour may evolve. Google Search Central's official AI optimisation guidance (15 May 2026) is cited both as primary source confirmation for mechanisms underlying Answer Architecture and Self-Containment (RAG and query fan-out), and as disconfirming evidence for the role of schema and llms.txt as citation drivers on Google AI surfaces specifically. Where controlled studies, cross-sectional audits, or official platform guidance disconfirm common industry claims (the Ahrefs schema study, the searchVIU retrieval experiment, Wood's 50-protocol crypto audit, and Google's mythbusting section), we present those findings alongside confirming evidence. This page is reviewed and updated quarterly as new research emerges.

On measurement scope

As of June 2026, Google Search Console launched dedicated AI performance reports showing impressions in AI Overviews and AI Mode, covering Google's surfaces only. There is no equivalent reporting tool for ChatGPT, Perplexity, Claude, or Microsoft Copilot. The research on this page is one reason CBA's audit covers all five platforms separately: the citation mechanisms differ, and a tool that measures one platform's impressions can't substitute for a five-platform citation audit. See what the new GSC report shows and what it doesn't →

Work with us

If you want to understand how your content performs against these principles, we can analyse it using the CPS framework.

Start with one of these:

Request a CPS audit → Score a block free → Benchmark against competitors →

Get the full CPS audit on your site

Get your Solo audit across ChatGPT, Perplexity, and Gemini for £49 one-off, or £99-149/month with no minimum commitment. Prefer to try first? Run the free automated CPS Lite Score in seconds.

Start Your £49 Audit →

CPS Research Foundation

What this page is

Why this matters

The Five CPS Pillars

Cross-Pillar Evidence: Authority Beyond the Page

Large-scale correlation study

Platform confirmation

Third-party content corpus drives AI citation (CryptoContent.dev, May 2026)

What the Evidence Says Doesn't Directly Drive Citations

Schema markup on already-cited pages (Ahrefs, May 2026)

Schema at retrieval time (searchVIU, 2025)

Why correlation studies overstate schema's role

Schema in industry verticals: a 50-protocol crypto audit (Wood, CryptoContent.dev, May 2026)

Google Search Central's own position (15 May 2026)

Independent third-party synthesis: Perea Research (May 2026)

What CPS Does Differently

Research Note

Work with us

Start with one of these:

Related

Get the full CPS audit on your site