Machine-Readability Is the Floor, Not the Ceiling
Getting your site machine-readable is necessary. Schema markup, llms.txt, clean crawler access, structured entity data: all of it matters, and any business that hasn't done it yet is starting from a deficit. But machine-readability is eligibility. It's not selection. And the difference between those two things is where most ASEO advice currently stops.
There's a growing category of ASEO advice, and a growing number of products built around it, that positions machine-readability as the core problem to solve. Get the infrastructure right, build the structured signals layer, make yourself legible to AI systems, and you'll start appearing in AI answers. The argument is coherent. It's also incomplete.
A site can be perfectly machine-readable and still get skipped at retrieval. That's not a technical failure. It's a content failure, at the paragraph level. And it's the failure that most infrastructure-focused approaches don't touch.
Two layers. One problem is visible. One isn't.
AI citation involves two distinct decisions, made at two different stages. Most current advice focuses on the first and skips the second.
Layer 1 is the infrastructure problem. It's well-understood, well-documented, and there are now a reasonable number of services that address it. You need it. Skipping it means AI systems can't confidently identify you as an entity, can't access your content, and won't cite you regardless of quality.
Layer 2 is the content problem. It operates at the level of individual 134-167 word content chunks (the size AI retrieval systems embed and evaluate when matching a query to candidate sources). Once a site passes the machine-readability threshold, the retrieval decision comes down to which blocks score highest on a set of citability signals. Infrastructure doesn't affect that score. The words in the block do.
What each layer actually covers
- Schema markup and structured data
- llms.txt and AI crawler access
- Structured entity identity signals
- Clean HTML rendering (no JS blind spots)
- robots.txt permissions for AI bots
- Geographic and service area signals
- Declarative answer structure per paragraph
- Fact density per 100 words
- Self-containment without surrounding context
- Freshness signals and date markers
- RAG chunk size compliance (134-167 words)
- Opening pattern that matches retrieval expectations
These are different problems. You can solve Layer 1 completely and still score poorly on Layer 2. A site with perfect schema, a clean llms.txt, and flawless entity data can have every paragraph opening with brand narrative, containing no verifiable facts, and referencing context that AI retrieval can't see. That site is machine-readable. It isn't citable.
Where the infrastructure argument stops
The clearest version of the machine-readability argument appears in content like Surfaced's recent post, "The End of Search as We Know It: Why Machine-Readability Is the New Competitive Edge." The framing is honest and largely correct at Layer 1: most businesses haven't built the structured signals layer that tells AI systems who they are and what they do. That gap is real. The article is worth reading.
Where it stops is exactly where Layer 2 begins. Surfaced's argument is that building the infrastructure is what gets you cited. The infrastructure gets you eligible. Two different outcomes.
Consider what actually happens when someone asks ChatGPT "what's the best ASEO consultancy in the UK?" The model doesn't simply check which sites have clean schema and return those. It runs a retrieval process across its training data, pulls the content blocks that best match the query semantics, and generates an answer from those blocks. The blocks it selects are the ones with the highest citability score at query time. A perfectly structured entity with low-citability paragraphs doesn't appear. A less polished entity with high-citability paragraphs does.
Machine-readability determines whether you're in the room. Block-level citability determines whether you're the answer.
The five signals that determine selection
The Citation Probability Score® (CPS®) framework measures citability at block level across five pillars. Each one governs a distinct aspect of how AI retrieval systems evaluate a 134-167 word chunk:
-
Content Structure Is the block the right length for RAG chunking, and does it open with a direct declarative statement rather than brand context or preamble?
-
Fact Density How many named entities, statistics, and verifiable claims appear per 100 words? AI retrieval systems weight fact-rich blocks significantly higher than descriptive or narrative text.
-
Answer Structure Does the block open with the declarative pattern retrieval systems favour: "[Topic] is/means/provides [specific outcome]"? Blocks that bury the answer after three sentences of context are consistently skipped.
-
Self-Containment Does the block make complete sense without the paragraph above it, the section heading, or any surrounding context? Retrieval systems don't have that context. Blocks that depend on it fail silently.
-
Freshness Signals Does the block contain date markers, "as of [year]" language, or version-specific references? Perplexity and Bing-powered AI search weight recency heavily. Undated blocks lose ground to dated ones on time-sensitive queries.
None of these are infrastructure signals. Schema markup doesn't affect fact density. llms.txt doesn't change whether a paragraph opens with a declarative answer. Entity data doesn't determine whether a content block is self-contained. These are content decisions, made at the paragraph level, that sit entirely above the machine-readability layer.
The sequence that actually works
Infrastructure first. Content second. That's the right order, and both layers are required.
A business that skips Layer 1 and writes highly citable content is building on an unstable foundation. AI systems that can't reliably identify and access a site won't cite it consistently regardless of content quality. The infrastructure work isn't optional; it's prerequisite.
But a business that completes Layer 1 and assumes the job is done will keep wondering why their AI visibility isn't improving. They've solved eligibility. They haven't solved selection. The paragraphs are machine-readable. They're not citable. And no amount of additional schema is going to change that.
The full sequence: Build the machine-readability foundation (schema, llms.txt, entity signals, crawler access), then score every page at block level using CPS® to identify which paragraphs are failing at retrieval and rewrite them to Grade B minimum.
Both layers have a measurable output. Machine-readability has a binary pass/fail: either AI crawlers can access and understand your site, or they can't. Block-level citability has a continuous score: 0-100 per page, with a pillar breakdown showing exactly which signal is causing the underperformance and what to change. The free CPS® Block Scorer at citedbyai.info/cps-scorer gives you an instant score on any paragraph with no signup required. Paste your current best paragraph. If it's below Grade B, you've found the problem.
Check your citability layer, not just your infrastructure
Paste any paragraph and get a 0-100 Citation Probability Score® with a five-pillar breakdown. Free, instant, no signup. Tells you exactly which signal is failing and what to fix.
Score a Paragraph Now →Want both layers audited in one report?
Free audit. 27 modules. 5 platforms. Machine-readability plus block-level CPS® scoring across your entire site. Results in 48 hours.
Get Your Free Audit →