Schema vs Content Signals: What Actually Drives AI Citations
On 11 May 2026, Ahrefs published a controlled study tracking 1,885 pages that added JSON-LD schema between August 2025 and March 2026. Citation changes across Google AI Overviews, Google AI Mode and ChatGPT barely moved. The schema-first positioning that anchors a category of AEO and GEO tools just lost its strongest claim. Here's what the study means, what it doesn't, and what actually drives AI citation decisions.
If you've been told that adding schema is the way to get cited by ChatGPT or Perplexity, the controlled evidence as of today says: not on pages already in the consideration set. Schema still has real upstream value. It's just not the citation lever it's been sold as.
What Ahrefs actually found
Ahrefs (Louise Linehan and Xibeijia Guan, reviewed by Ryan Law) ran a two-stage study. Stage one looked at 6 million URLs and found AI-cited pages were almost three times more likely to have JSON-LD than non-cited pages. That's the correlation everyone has been quoting. Stage two was the controlled follow-up: 1,885 pages that added JSON-LD between August 2025 and March 2026, matched against 4,000 control pages, measured with difference-in-differences analysis across four separate statistical tests.
The headline numbers are these.
All four statistical tests pointed the same way: no meaningful citation growth from adding schema. On AI Overviews the treated pages dropped 4.6% more than matched controls. Both groups were already on a downward AIO trajectory before schema was added, so the small decline can't be cleanly attributed to schema. But equally, if schema were doing the work people claim, you'd expect treated pages to outperform controls. They didn't.
The mechanistic confirmation
There's a second piece of evidence worth pairing with the Ahrefs controlled study. A searchVIU experiment tested whether five AI systems use schema markup when fetching a page in real-time: ChatGPT, Claude, Perplexity, Gemini, and Google AI Mode. None of them did. Every system extracted only visible HTML content during direct retrieval. JSON-LD, hidden Microdata, and hidden RDFa were all ignored.
That's mechanistic supporting evidence. Ahrefs' study shows adding schema doesn't move citation outcomes on already-cited pages. searchVIU shows why: the AI systems doing the citing aren't reading schema during the retrieval loop. They're reading what users see.
What this doesn't mean
It's worth being precise about the scope, because the wrong conclusion is "schema is useless" and that isn't what the study supports. Three things the Ahrefs finding does not say:
- It doesn't say schema is useless. Schema produces rich results in traditional search, supports voice assistants and shopping features, helps knowledge graph entity recognition, and provides semantic context for crawlers. None of that is in dispute.
- It doesn't apply to pages not yet visible to AI. The Ahrefs study population had 100+ AI Overview citations as a February 2025 baseline. These pages were already inside the consideration set. Ahrefs explicitly carves this out: "For pages that aren't being seen by AI systems at all, schema markup might still play a role in helping them get crawled, parsed, or indexed in the first place."
- It doesn't disprove that 53% of AI-cited pages have schema. They do. But the study explains why: sites that add structured data also tend to invest in technical SEO, publish authoritative content, build links and maintain their pages. Schema rides the same quality wave as every other signal. Strip schema out and the rest of the stack likely still carries the page.
What this does mean
The claim that's now hard to defend is the narrower one: that adding schema markup to an already-visible page will measurably increase AI citations. That's the claim a category of AEO and GEO tools have been leading with. The controlled data says it isn't true. Or if it is true, the effect is small enough to be lost in statistical noise across thousands of URLs.
If your ASEO strategy treats schema as the primary citation lever, the strategy needs updating. Schema belongs in the technical readiness layer, alongside crawlability, robots.txt, llms.txt, and entity recognition signals. It's not the lever that moves citation outcomes once you're visible.
Two ways to think about the signals
The cleanest way to internalise this is to split the AI citation problem into two stacks: the technical readiness stack (does the AI know your page exists and what entity it represents?) and the citation decision stack (when the AI is choosing what to cite from its consideration set, what makes it pick you?). Different stacks. Different signals. Different priority.
- AI crawler access (robots.txt, firewall rules)
- llms.txt presence and structure
- JSON-LD schema for entity recognition
- Knowledge graph signal coverage
- Sitemap quality, indexability
- Brand entity establishment
- Block size in the 134–167 word RAG range
- Declarative opening sentences per block
- Fact density: named entities and statistics
- Self-containment in isolation
- Visible freshness markers and recency cues
- Semantic segmentation at chunk level
Schema sits on the left. The Ahrefs study tells us the left column gets pages into the room but doesn't decide who gets quoted. The right column is what decides who gets quoted. That's where attention belongs once a page is visible.
The five content signals that do the work
If schema isn't the citation lever, what is? Cited By AI®'s Citation Probability Score® (CPS®) framework scores five content-level signals at the block level, not the page level. We didn't pick these signals because we liked them. We picked them because peer-reviewed retrieval research and large-scale citation studies kept pointing at them.
The CPS® five-pillar framework
- Content StructureBlock size aligned to the 134–167 word RAG retrieval chunk. Declarative opening sentence. Semantic segmentation that lets a retrieval model extract a passage cleanly. The GEO: Generative Engine Optimization study (KDD 2024) showed structural optimisation produced up to 40% visibility uplift in controlled benchmarks.
- Fact DensityNamed entities, statistics and verifiable claims per 100 words. The GEO study found adding statistics increased AI visibility by approximately 41%. AI retrieval models weight fact-rich passages 2–3× higher than descriptive prose.
- Answer ArchitectureThe declarative pattern AI retrieval systems are designed to surface: "[Topic] is/provides/enables [specific outcome]." Blocks that open this way self-answer the implied query and don't need surrounding context. Most brand content opens with scene-setting and pays the price in retrieval ranking.
- Self-ContainmentDoes the block make complete sense in isolation, without the paragraph before it or the heading above it? AI systems extract blocks in isolation. Dangling pronouns, "as mentioned above," "see below" all sink retrieval scores. Self-contained blocks get cited; context-dependent blocks get skipped.
- FreshnessVisible date markers, "as of [year]" language, "updated" and "latest" cues within the block itself. Carries the lowest weighting overall but disproportionately important for Perplexity and Bing-powered AI systems that weight recency heavily.
Each pillar is scored on the unit AI retrieval actually operates on: the content block, typically 134–167 words. Page-level scoring averages strong blocks with weak ones and tells you the page is fine. Block-level scoring tells you which paragraph is being skipped and what specifically to rewrite. The full methodology is published at citedbyai.info/citation-probability-score-framework, and the evidence base is at citedbyai.info/cps-research-foundation, which now includes a "What the evidence says doesn't directly drive citations" section explicitly citing the Ahrefs study.
What to update in your ASEO strategy
Three concrete moves, in priority order.
One: stop treating schema as the primary citation lever. Keep it for crawlability, voice assistants, rich results, and knowledge graph signal. Drop it from the top of your AI visibility priority list. It belongs in technical readiness, not citation strategy.
Two: audit your content at the block level. Pull a representative sample of your pages. Read the first paragraph of each one. Does it open with a declarative answer or with brand narrative? Is it in the 134–167 word range? Does it contain named statistics and verifiable claims? Does it make sense without the heading above it? If you're failing on those four checks, you have a citation problem schema can't fix.
Three: treat schema-first AI visibility tools as solving a different problem. They're useful as publish-time signal checks inside their respective CMSes. They're not measuring what drives citation decisions on already-visible pages. The category that leads with schema scoring is solving the technical readiness layer. Useful, but not sufficient. We covered the methodology distinction in detail in Citability Score vs CPS®: What Each Actually Measures.
The intellectually honest framing. Schema is a real signal with real upstream value. It's not a citation driver on already-visible pages. The Ahrefs controlled study, combined with the searchVIU mechanistic finding, makes that the most defensible position as of May 2026. Tools, agencies and frameworks that update their advice to match the evidence are the ones a serious buyer should trust. Tools that keep selling schema as the citation lever after this study aren't reading the research.
See what AI bots can read on your site right now
Free tool. Enter your URL, choose an AI crawler identity (GPTBot, ClaudeBot, PerplexityBot and 12 others), see what's actually visible. Surfaces robots.txt blocks, llms.txt presence and content gaps in under 30 seconds.
Run the AI Crawler Simulator →The bottom line
The Ahrefs study doesn't tell us schema is useless. It tells us schema isn't doing what a category of AI visibility tools have been claiming it does. On pages already cited by AI, adding schema produces no measurable citation uplift on AI Mode or ChatGPT, and a small decline on AI Overviews that can't be cleanly attributed to schema either way. The mechanism is consistent with searchVIU's finding that AI systems ignore schema during direct retrieval and extract only visible HTML.
What this means for your strategy: schema is technical readiness, not citation strategy. The signals that drive citation decisions live in the content itself, scored at the block level. Block structure, fact density, declarative answer architecture, self-containment and freshness. Those are the five pillars of CPS®, and they're where the lever actually is.
If your current ASEO programme is built around schema scoring, this is the moment to rebalance. The evidence has moved. The advice should move with it.
Score your content the way AI actually scores it
Free 28-module ASEO audit with block-level CPS® scoring across all five content pillars, hallucination detection across five AI platforms, funnel-stage SOV, and GA4 revenue attribution. Results in 48 hours.
Get Your Free ASEO Audit →