Machine-Readability Is the Floor, Not the Ceiling
Getting your site machine-readable is necessary. Schema markup, llms.txt, clean crawler access, structured entity data: all of it matters, and any business that hasn't done it yet is starting from a deficit. But machine-readability is eligibility. It's not selection. And the difference between those two things is where most ASEO advice currently stops.
There's a growing category of ASEO advice, and a growing number of products built around it, that positions machine-readability as the core problem to solve. Get the infrastructure right, build the structured signals layer, make yourself legible to AI systems, and you'll start appearing in AI answers. The argument is coherent. It's also incomplete.
A site can be perfectly machine-readable and still get skipped at retrieval. That's not a technical failure. It's a content failure, at the paragraph level. And it's the failure that most infrastructure-focused approaches don't touch.
Two layers. One problem is visible. One isn't.
AI citation involves two distinct decisions, made at two different stages. Most current advice focuses on the first and skips the second.
Layer 1 is the infrastructure problem. It's well-understood, well-documented, and there are now a reasonable number of services that address it. You need it. Skipping it means AI systems can't confidently identify you as an entity, can't access your content, and won't cite you regardless of quality.
One important platform-specific clarification on Layer 1. Google's 15 May 2026 AI optimisation guide explicitly states that schema markup and llms.txt files aren't required for AI Overviews or AI Mode. For Google specifically, machine-readability still matters for crawling and indexing, but the schema-and-llms.txt theatre that AEO/GEO tools sell as eligibility signals isn't doing the work for Google's AI surfaces. ChatGPT, Perplexity, Claude, and Microsoft Copilot each handle Layer 1 differently. The full per-platform breakdown is in Platform-Specific Citation Guidance. The deeper analysis of what Google's guide does and doesn't validate is in Google's AI Optimisation Guide: What It Says, What It Doesn't, and What It Means for ASEO. The principle of the diagram above still holds: Layer 1 is necessary infrastructure, Layer 2 is the citation lever.
Layer 2 is the content problem. It operates at the level of individual 134-167 word content chunks (the size AI retrieval systems embed and evaluate when matching a query to candidate sources). Once a site passes the machine-readability threshold, the retrieval decision comes down to which blocks score highest on a set of citability signals. Infrastructure doesn't affect that score. The words in the block do.
The Lighthouse complication: Google now has two positions on llms.txt
Less than two weeks before that Search Central guide, a different team inside Google shipped Lighthouse 13.3. The new release added an Agentic Browsing audit category, and one of the audits inside it checks for the presence of an llms.txt file at your domain root. So the position depends on which Google product you ask: Search Central says you don't need it; Lighthouse measures whether you have it. Both are correct because they're answering different questions.
Google Search Central (15 May 2026): "You don't need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search."
Google Lighthouse 13.3 (early May 2026): The new Agentic Browsing category includes an llms.txt audit that "checks for the presence of a machine-readable summary at the domain root." The audit is marked Not Applicable rather than Failed if the file is absent, and the category itself is flagged experimental.
Same file, two different use cases. Search Central measures generative AI in search results. Lighthouse measures readiness for browser automation agents (Claude in Chrome, Operator, Comet). The audit reflects a different mechanism, not a contradiction.
The practical read: llms.txt isn't a citation lever on any of the five AI platforms covered by Cited By AI®'s audit (ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot) nor on Google's AI surfaces. It's a useful signal for the next layer of the agentic web, where browser agents perform actions on sites on a user's behalf. Both statements are true. Building an llms.txt because you've been told it lifts your AI citation rate is the wrong reason. Building one because you want your site legible to browser agents is the right reason.
What each layer actually covers
- Schema markup and structured data
- llms.txt and AI crawler access
- Structured entity identity signals
- Clean HTML rendering (no JS blind spots)
- robots.txt permissions for AI bots
- Geographic and service area signals
- Declarative answer structure per paragraph
- Fact density per 100 words
- Self-containment without surrounding context
- Freshness signals and date markers
- RAG chunk size compliance (134-167 words)
- Opening pattern that matches retrieval expectations
These are different problems. You can solve Layer 1 completely and still score poorly on Layer 2. A site with perfect schema, a clean llms.txt, and flawless entity data can have every paragraph opening with brand narrative, containing no verifiable facts, and referencing context that AI retrieval can't see. That site is machine-readable. It isn't citable.
Where the infrastructure argument stops
The clearest version of the machine-readability argument appears in content like Surfaced's recent post, "The End of Search as We Know It: Why Machine-Readability Is the New Competitive Edge." The framing is honest and largely correct at Layer 1: most businesses haven't built the structured signals layer that defines each business's identity and what it does. That gap is real. The article is worth reading.
Where it stops is exactly where Layer 2 begins. Surfaced's argument is that building the infrastructure is what gets you cited. The infrastructure gets you eligible. Two different outcomes.
Consider what actually happens when someone asks ChatGPT "what's the best ASEO consultancy in the UK?" The model doesn't simply check which sites have clean schema and return those. It runs a retrieval process across its training data, pulls the content blocks that best match the query semantics, and generates an answer from those blocks. The blocks it selects are the ones with the highest citability score at query time. A perfectly structured entity with low-citability paragraphs doesn't appear. A less polished entity with high-citability paragraphs does.
Machine-readability determines whether you're in the room. Block-level citability determines whether you're the answer.
The five signals that determine selection
The Citation Probability Score® (CPS®) framework measures citability at block level across five pillars. Each one governs a distinct aspect of how AI retrieval systems evaluate a 134-167 word chunk:
-
Content Structure Is the block the right length for RAG chunking, and does it open with a direct declarative statement rather than brand context or preamble?
-
Fact Density How many named entities, statistics, and verifiable claims appear per 100 words? AI retrieval systems weight fact-rich blocks significantly higher than descriptive or narrative text.
-
Answer Structure Does the block open with the declarative pattern retrieval systems favour: "[Topic] is/means/provides [specific outcome]"? Blocks that bury the answer after three sentences of context are consistently skipped.
-
Self-Containment Does the block make complete sense without the paragraph above it, the section heading, or any surrounding context? Retrieval systems don't have that context. Blocks that depend on it fail silently.
-
Freshness Signals Does the block contain date markers, "as of [year]" language, or version-specific references? Perplexity and Bing-powered AI search weight recency heavily. Undated blocks lose ground to dated ones on time-sensitive queries.
None of these are infrastructure signals. Schema markup doesn't affect fact density. llms.txt doesn't change whether a paragraph opens with a declarative answer. Entity data doesn't determine whether a content block is self-contained. These are content decisions, made at the paragraph level, that sit entirely above the machine-readability layer.
Three layers, three buyer questions
The original two-layer view (machine-readability as floor, block-level citability as ceiling) covers the citation question. With Lighthouse 13.3, a third layer is now formally on the table: agentic readiness for browser automation agents. Each layer answers a different buyer question.
Notice that llms.txt sits on both Layer 1 and Layer 3 but for different reasons. On Layer 1, it's a discoverability courtesy: it can help some crawlers orient. On Layer 3, it's a documented audit signal: Lighthouse explicitly checks for it. Neither use case is the same as Layer 2 citation work, which is what most ASEO programmes actually measure.
A consultancy that collapses these into one layer (whether by selling llms.txt as a citation lever or by dismissing it as worthless) is misreading the picture. Cited By AI®'s audit measures all three independently because each is a real but distinct decision your site makes a buyer journey worse or better at.
The sequence that actually works
Infrastructure first. Content second. That's the right order, and both layers are required.
A business that skips Layer 1 and writes highly citable content is building on an unstable foundation. AI systems that can't reliably identify and access a site won't cite it consistently regardless of content quality. The infrastructure work isn't optional; it's prerequisite.
But a business that completes Layer 1 and assumes the job is done will keep wondering why their AI visibility isn't improving. They've solved eligibility. They haven't solved selection. The paragraphs are machine-readable. They're not citable. And no amount of additional schema is going to change that.
The full sequence: Build the machine-readability foundation (schema, llms.txt, entity signals, crawler access), then score every page at block level using CPS® to identify which paragraphs are failing at retrieval and rewrite them to Grade B minimum.
Both layers have a measurable output. Machine-readability has a binary pass/fail: either AI crawlers can access and understand your site, or they can't. Block-level citability has a continuous score: 0-100 per page, with a pillar breakdown showing exactly which signal is causing the underperformance and what to change. The free CPS® Block Scorer at citedbyai.info/cps-scorer gives you an instant score on any paragraph with no signup required. Paste your current best paragraph. If it's below Grade B, you've found the problem.
Check your citability layer, not just your infrastructure
Paste any paragraph and get a 0-100 Citation Probability Score® with a five-pillar breakdown. Free, instant, no signup. Tells you exactly which signal is failing and what to fix.
Score a Paragraph Now →Want both layers audited in one report?
Free audit. 27 modules. 5 platforms. Machine-readability plus block-level CPS® scoring across your entire site. Results in 48 hours.
Get Your Free Audit →