Is machine-readability enough to get cited by AI search?

No. Machine-readability (schema markup, llms.txt, structured entity data, and clean crawler access) is necessary for AI citation but not sufficient. It determines whether AI platforms can read and understand your site (eligibility). It does not determine whether AI retrieval systems select your specific content blocks over a competitor's when answering a query (selection). Selection is governed by block-level citability signals: whether each 134-167 word content chunk opens with a declarative answer, contains verifiable facts, is self-contained, and carries freshness signals. A perfectly machine-readable site with low-citability paragraphs still loses at the retrieval stage.

What is the difference between machine-readability and AI citability?

Machine-readability means AI crawlers can access, parse, and understand your site. It covers technical infrastructure: schema markup, llms.txt files, robots.txt permissions, structured entity data, and clean HTML rendering. AI citability is a content-level property: it measures how likely a specific paragraph is to be retrieved and cited when an AI platform processes a query relevant to that content. The Citation Probability Score (CPS®) framework measures citability at block level, scoring each 134-167 word chunk across five pillars: Content Structure, Fact Density, Answer Structure, Self-Containment, and Freshness Signals. Machine-readability is the infrastructure layer. Citability is the content layer above it.

Does schema markup guarantee AI citation?

No. Schema markup improves machine-readability and entity disambiguation: it tells AI systems what your business is and helps them verify your identity. It does not score or improve the probability that a specific page or paragraph will be retrieved and cited in response to a particular query. AI retrieval operates at the content block level, evaluating each chunk of 134-167 words for answer structure, fact density, self-containment, and freshness. Schema helps AI understand your site exists. It does not determine which of your paragraphs AI selects when answering a user's question.

What is block-level citability scoring?

Block-level citability scoring measures how likely each individual content block (typically 134-167 words, the size AI retrieval systems embed and evaluate) is to be retrieved and cited. The Citation Probability Score (CPS®) framework scores each block across five pillars: Content Structure (is the block the right size and does it open correctly?), Fact Density (how many verifiable claims per 100 words?), Answer Structure (does it open with a declarative answer?), Self-Containment (does it make sense without surrounding context?), and Freshness Signals (does it carry date markers and recency language?). A block scoring Grade B or above is significantly more likely to be retrieved and cited than one scoring below.

ASEO Fundamentals

Machine-Readability Is the Floor, Not the Ceiling

Published: 13 April 2026 Author: Cited By AI® Reading time: 6 min

Version 1.0 | Published 13 April 2026 | Last verified: 13 April 2026 | Source: citedbyai.info AI Visibility Intelligence

Getting your site machine-readable is necessary. Schema markup, llms.txt, clean crawler access, structured entity data: all of it matters, and any business that hasn't done it yet is starting from a deficit. But machine-readability is eligibility. It's not selection. And the difference between those two things is where most ASEO advice currently stops.

There's a growing category of ASEO advice, and a growing number of products built around it, that positions machine-readability as the core problem to solve. Get the infrastructure right, build the structured signals layer, make yourself legible to AI systems, and you'll start appearing in AI answers. The argument is coherent. It's also incomplete.

A site can be perfectly machine-readable and still get skipped at retrieval. That's not a technical failure. It's a content failure, at the paragraph level. And it's the failure that most infrastructure-focused approaches don't touch.

Two layers. One problem is visible. One isn't.

AI citation involves two distinct decisions, made at two different stages. Most current advice focuses on the first and skips the second.

Layer 1 — Floor

Machine-Readability

Can AI crawlers access, parse, and verify your site? Do they recognise your identity as an entity?

Schema markup, llms.txt, robots.txt, entity data, structured signals, clean rendering

Layer 2 — Above

Block-Level Citability

When AI runs a retrieval query, which specific paragraphs get selected over your competitor's?

Answer structure, fact density, self-containment, freshness signals, declarative opening, measured by CPS®

Layer 1 is the infrastructure problem. It's well-understood, well-documented, and there are now a reasonable number of services that address it. You need it. Skipping it means AI systems can't confidently identify you as an entity, can't access your content, and won't cite you regardless of quality.

One important platform-specific clarification on Layer 1. Google's 15 May 2026 AI optimisation guide explicitly states that schema markup and llms.txt files aren't required for AI Overviews or AI Mode. For Google specifically, machine-readability still matters for crawling and indexing, but the schema-and-llms.txt theatre that AEO/GEO tools sell as eligibility signals isn't doing the work for Google's AI surfaces. ChatGPT, Perplexity, Claude, and Microsoft Copilot each handle Layer 1 differently. The full per-platform breakdown is in Platform-Specific Citation Guidance. The deeper analysis of what Google's guide does and doesn't validate is in Google's AI Optimisation Guide: What It Says, What It Doesn't, and What It Means for ASEO. The principle of the diagram above still holds: Layer 1 is necessary infrastructure, Layer 2 is the citation lever.

Layer 2 is the content problem. It operates at the level of individual 134-167 word content chunks (the size AI retrieval systems embed and evaluate when matching a query to candidate sources). Once a site passes the machine-readability threshold, the retrieval decision comes down to which blocks score highest on a set of citability signals. Infrastructure doesn't affect that score. The words in the block do.

The Lighthouse complication: Google now has two positions on llms.txt

Less than two weeks before that Search Central guide, a different team inside Google shipped Lighthouse 13.3. The new release added an Agentic Browsing audit category, and one of the audits inside it checks for the presence of an llms.txt file at your domain root. So the position depends on which Google product you ask: Search Central says you don't need it; Lighthouse measures whether you have it. Both are correct because they're answering different questions.

First-party Google confirmation · Updated 20 May 2026

Two Google teams now publish different llms.txt guidance

Google Search Central (15 May 2026): "You don't need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search."

Google Lighthouse 13.3 (early May 2026): The new Agentic Browsing category includes an llms.txt audit that "checks for the presence of a machine-readable summary at the domain root." The audit is marked Not Applicable rather than Failed if the file is absent, and the category itself is flagged experimental.

Same file, two different use cases. Search Central measures generative AI in search results. Lighthouse measures readiness for browser automation agents (Claude in Chrome, Operator, Comet). The audit reflects a different mechanism, not a contradiction.

Sources: Google Search Central guide (15 May 2026) · Lighthouse Agentic Browsing scoring docs · SEJ analysis (20 May 2026)

The practical read: llms.txt isn't a citation lever on any of the five AI platforms covered by Cited By AI®'s audit (ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot) nor on Google's AI surfaces. It's a useful signal for the next layer of the agentic web, where browser agents perform actions on sites on a user's behalf. Both statements are true. Building an llms.txt because you've been told it lifts your AI citation rate is the wrong reason. Building one because you want your site legible to browser agents is the right reason.

What each layer actually covers

Machine-Readability covers

Schema markup and structured data
llms.txt and AI crawler access
Structured entity identity signals
Clean HTML rendering (no JS blind spots)
robots.txt permissions for AI bots
Geographic and service area signals

Block-level citability covers

Declarative answer structure per paragraph
Fact density per 100 words
Self-containment without surrounding context
Freshness signals and date markers
RAG chunk size compliance (134-167 words)
Opening pattern that matches retrieval expectations

These are different problems. You can solve Layer 1 completely and still score poorly on Layer 2. A site with perfect schema, a clean llms.txt, and flawless entity data can have every paragraph opening with brand narrative, containing no verifiable facts, and referencing context that AI retrieval can't see. That site is machine-readable. It isn't citable.

Where the infrastructure argument stops

The clearest version of the machine-readability argument appears in content like Surfaced's recent post, "The End of Search as We Know It: Why Machine-Readability Is the New Competitive Edge." The framing is honest and largely correct at Layer 1: most businesses haven't built the structured signals layer that defines each business's identity and what it does. That gap is real. The article is worth reading.

Where it stops is exactly where Layer 2 begins. Surfaced's argument is that building the infrastructure is what gets you cited. The infrastructure gets you eligible. Two different outcomes.

Consider what actually happens when someone asks ChatGPT "what's the best ASEO consultancy in the UK?" The model doesn't simply check which sites have clean schema and return those. It runs a retrieval process across its training data, pulls the content blocks that best match the query semantics, and generates an answer from those blocks. The blocks it selects are the ones with the highest citability score at query time. A perfectly structured entity with low-citability paragraphs doesn't appear. A less polished entity with high-citability paragraphs does.

Machine-readability determines whether you're in the room. Block-level citability determines whether you're the answer.

The five signals that determine selection

The Citation Probability Score® (CPS®) framework measures citability at block level across five pillars. Each one governs a distinct aspect of how AI retrieval systems evaluate a 134-167 word chunk:

Content Structure Is the block the right length for RAG chunking, and does it open with a direct declarative statement rather than brand context or preamble?
Fact Density How many named entities, statistics, and verifiable claims appear per 100 words? AI retrieval systems weight fact-rich blocks significantly higher than descriptive or narrative text.
Answer Structure Does the block open with the declarative pattern retrieval systems favour: "[Topic] is/means/provides [specific outcome]"? Blocks that bury the answer after three sentences of context are consistently skipped.
Self-Containment Does the block make complete sense without the paragraph above it, the section heading, or any surrounding context? Retrieval systems don't have that context. Blocks that depend on it fail silently.
Freshness Signals Does the block contain date markers, "as of [year]" language, or version-specific references? Perplexity and Bing-powered AI search weight recency heavily. Undated blocks lose ground to dated ones on time-sensitive queries.

None of these are infrastructure signals. Schema markup doesn't affect fact density. llms.txt doesn't change whether a paragraph opens with a declarative answer. Entity data doesn't determine whether a content block is self-contained. These are content decisions, made at the paragraph level, that sit entirely above the machine-readability layer.

Three layers, three buyer questions

The original two-layer view (machine-readability as floor, block-level citability as ceiling) covers the citation question. With Lighthouse 13.3, a third layer is now formally on the table: agentic readiness for browser automation agents. Each layer answers a different buyer question.

Layer 1 — Machine-Readability
Schema, llms.txt, crawler access, E-E-A-T
Can the bots even read my site?
Layer 2 — Block-Level Citability
CPS® content signals (the five pillars above)
Will ChatGPT, Perplexity, Claude, Gemini, or Copilot cite my brand in their answers?
Layer 3 — Agentic Readiness
llms.txt, WebMCP, accessibility tree, CLS
Can browser agents complete actions on my site?

Notice that llms.txt sits on both Layer 1 and Layer 3 but for different reasons. On Layer 1, it's a discoverability courtesy: it can help some crawlers orient. On Layer 3, it's a documented audit signal: Lighthouse explicitly checks for it. Neither use case is the same as Layer 2 citation work, which is what most ASEO programmes actually measure.

A consultancy that collapses these into one layer (whether by selling llms.txt as a citation lever or by dismissing it as worthless) is misreading the picture. Cited By AI®'s audit measures all three independently because each is a real but distinct decision your site makes a buyer journey worse or better at.

The sequence that actually works

Infrastructure first. Content second. That's the right order, and both layers are required.

A business that skips Layer 1 and writes highly citable content is building on an unstable foundation. AI systems that can't reliably identify and access a site won't cite it consistently regardless of content quality. The infrastructure work isn't optional; it's prerequisite.

But a business that completes Layer 1 and assumes the job is done will keep wondering why their AI visibility isn't improving. They've solved eligibility. They haven't solved selection. The paragraphs are machine-readable. They're not citable. And no amount of additional schema is going to change that.

The full sequence: Build the machine-readability foundation (schema, llms.txt, entity signals, crawler access), then score every page at block level using CPS® to identify which paragraphs are failing at retrieval and rewrite them to Grade B minimum.

Both layers have a measurable output. Machine-readability has a binary pass/fail: either AI crawlers can access and understand your site, or they can't. Block-level citability has a continuous score: 0-100 per page, with a pillar breakdown showing exactly which signal is causing the underperformance and what to change. The free CPS® Block Scorer at citedbyai.info/cps-scorer gives you an instant score on any paragraph with no signup required. Paste your current best paragraph. If it's below Grade B, you've found the problem.

Check your citability layer, not just your infrastructure

Paste any paragraph and get a 0-100 Citation Probability Score® with a five-pillar breakdown. Free, instant, no signup. Tells you exactly which signal is failing and what to fix.

Score a Paragraph Now →

Want both layers audited in one report?

Free audit. 27 modules. 5 platforms. Machine-readability plus block-level CPS® scoring across your entire site. Results in 48 hours.

Get Your Free Audit →

Machine-Readability Is the Floor, Not the Ceiling

Two layers. One problem is visible. One isn't.

The Lighthouse complication: Google now has two positions on llms.txt

What each layer actually covers

Where the infrastructure argument stops

The five signals that determine selection

Three layers, three buyer questions

The sequence that actually works

Check your citability layer, not just your infrastructure

Related

Want both layers audited in one report?