CPS® Research Foundation
The Evidence Behind AI Citation Visibility
What this page is
The CPS® (Citation Probability Score) framework measures how likely a page is to be cited by AI systems such as ChatGPT, Perplexity, and Google AI Overviews. This page explains where that model comes from.
It's built from three evidence types:
Research in retrieval and generative systems
Large-scale analyses of AI citation behaviour
Consistent findings across real-world datasets
Where findings are correlational rather than causal, we state that clearly.
Why this matters
Search is shifting from:
- →Ranking pages toward selecting passages
- →Keywords toward extractable answers
- →Authority alone toward authority plus structure plus clarity
If your content can't be cleanly extracted, it's unlikely to be cited, regardless of rankings.
The Five CPS® Pillars
Each CPS® pillar reflects a pattern observed across both academic retrieval research and real-world AI citation data.
How clearly your content is organised for extraction
AI systems don't "read" pages like humans. They parse structure: headings, sections, lists, and semantic layout. Well-structured content is easier to retrieve, segment, and reuse.
The GEO: Generative Engine Optimization study demonstrated that structural optimisation methods significantly improved visibility in AI-generated responses, with uplifts of up to ~40% in controlled benchmarks.
Research from Wix analysing 75,000+ AI answers found that structured formats (listicles, articles, product pages) account for over half of all citations, suggesting format alignment influences citation likelihood.
RAG (Retrieval-Augmented Generation) architectures retrieve passages, not full pages. Content that is clearly segmented is more reliably retrieved.
Content that is modular, clearly sectioned, and formatted for scanning is more likely to be cited than dense, unstructured pages.
The concentration of verifiable, attributed information
AI systems prioritise content that is specific, attributable, and evidence-backed.
The GEO study (KDD 2024) found that adding statistics increased AI visibility by ~41%, and adding expert quotations increased visibility by ~28%.
Large-scale analysis by Ahrefs shows that AI-cited content tends to be more structured, factual, and recently updated than traditionally ranked content.
Across LLM and retrieval literature, structured factual information is consistently extracted more reliably than unstructured prose.
Pages that include named data, cite sources, and present concrete claims are more likely to be selected as citation sources.
Whether your content answers the query immediately
AI systems favour content that answers first and elaborates second.
The GEO study (KDD 2024) shows that content structured to directly address queries performs better in AI-generated outputs.
Research on passage retrieval and the "lost in the middle" effect shows that information placed early in a passage is more likely to be used.
Analyses referencing Ahrefs data suggest that early-page content accounts for a disproportionate share of citations, reinforcing the importance of opening clarity.
If the answer is buried mid-paragraph, dependent on surrounding context, or delayed, it's less likely to be retrieved or cited.
Whether each section stands on its own
AI systems retrieve and evaluate content in chunks. Each section must make sense independently, contain a complete idea, and avoid reliance on prior context.
RAG research shows that retrieval occurs at the chunk level, with each passage evaluated independently.
Work on retrieval behaviour (including "lost in the middle" research) demonstrates that clarity and completeness within a passage directly affect usage.
Across GEO implementations, sections that can be read in isolation are more frequently extracted and cited.
A section that starts with "As mentioned above..." is structurally weaker than one that states the idea directly.
How clearly your content signals recency
AI systems show a measurable preference for newer, updated content.
Research from Ahrefs found that AI-cited content is, on average, significantly newer than traditionally ranked content.
Seer Interactive observed that the majority of AI crawler activity targets content published within the past one to two years.
Practitioner analyses consistently show that visible dates, updated statistics, and schema timestamps act as freshness signals.
Freshness isn't just about dates. It requires substantive updates: new data, updated references, and visible recency cues.
Cross-Pillar Evidence: Authority Beyond the Page
AI citation isn't purely page-level. Brand and entity signals matter.
Large-scale correlation study
Ahrefs found that brand mentions across the web correlate more strongly with AI visibility than traditional backlinks.
Platform confirmation
Microsoft has publicly confirmed that structured data (schema markup) helps its systems interpret web content, supporting the role of structured signals in AI processing.
Pages don't exist in isolation. They're evaluated within a broader entity and authority context.
What CPS® Does Differently
Most SEO frameworks optimise for rankings and traffic. CPS® is designed for citation probability, passage-level extractability, and AI system behaviour. It translates research into a measurable model across five dimensions.
| Framework | Optimises for |
|---|---|
| Traditional SEO | Rankings · Traffic |
| GEO / AEO tools | Visibility · Mention rate |
| CPS® Framework | Citation probability · Passage-level extractability · AI system behaviour |
Research Note
AI citation behaviour is an emerging field. The evidence on this page reflects peer-reviewed retrieval research, large-scale industry studies, and consistent practitioner observations as of April 2026.
Some findings are correlational rather than causal, and platform behaviour may evolve. This page is reviewed and updated quarterly as new research emerges.
Work with us
If you want to understand how your content performs against these principles, we can analyse it using the CPS® framework.
Start with one of these:
Get the full CPS® audit on your site
Every page scored at block level. Five-pillar breakdown per paragraph. Prioritised rewrite list. Free audit, results in 48 hours.
Get Your Free Audit →