The State of Agent Readability on the Web

Summary

On most of the popular web, an AI agent burns roughly twice the tokens it should.

We scored the 50,074 most-visited websites for how well an agent can discover, parse, and comprehend them. The plumbing search engines asked for is everywhere: 88% expose a robots.txt. The layer agents need is not. Only 27% ship an llms.txt and 25% an AGENTS.md. The median site scores 52/100, and not one of the 50,074 scored “excellent.” Raising a site’s a14y score roughly halves the tokens an agent spends to use it. The web is wide open for improvement and very few have moved.

Metric	Result
Sites scored	50,074
Median score	52 / 100
Scored “excellent”	none (best is 83)
llms.txt	27%
AGENTS.md	25%

The question

AI agents are becoming a primary way people reach the web. They read sites, follow links, and synthesize answers on someone’s behalf. That raises a concrete question for every site owner: can an agent use my site efficiently? The a14y scorecard answers it for one site. We ran it across the web at scale to ask the bigger version: is the web ready for agents?

“Ready” here is mechanical, not subjective. Does the site expose what an agent needs to find and parse it: a usable llms.txt, an AGENTS.md, a sitemap, clean robots rules, semantic HTML, all measured by the v0.2.0 scorecard’s checks. It does not judge the quality of the content itself.

Methodology

A single point-in-time scan of the CrUX most-visited list, scored in page mode against the published scorecard. Run on cloud infrastructure, off any residential IP.

Field	Value
Source	CrUX top-100k most-visited origins (zakird/crux-top-lists, global)
Sample	50,074 scored: 100,000 rows → 64,746 after dedup to one per registrable domain → 50,074 reachable & not adult/gambling/spam
Mode	Page mode: each site’s homepage, not a full-site crawl
Scorecard	Published `v0.2.0` (38 checks) and the in-flight `v0.3.0-draft` (45 checks), scored 0–100. Headline scores use v0.2.0; the analysis below draws on both, tagged.
Tool	`npx a14y` (the same audit anyone can run)
Infra	Sharded across 64 Cloud Run tasks; run data archived in object storage
Source label	`crux batch-2026-06-15-100k`
Date	June 18, 2026

Page mode is deliberate. Scoring one homepage per site keeps a survey this size tractable and comparable. It proxies a site’s agent readiness; it does not capture readiness that lives deeper in the site.

Results: the web scores low

Distribution of agent-readability scores across all 50,074 sites. Mean 50.5, median 52.

Band	Sites	Share
Excellent (85–100)	0	0%
Good (70–84)	1,678	3.4%
Fair (50–69)	26,530	53%
Poor (0–49)	21,866	43.7%

The popular web clusters in the middle and below. 43.7% of sites score under 50, and only 3.4% clear 70. Not one site in the top 50,074 scored “excellent” (85 or above); the single best managed 83. Whatever the web was optimized for, it wasn’t agents.

Explore all 50,074 sites on the web leaderboard →

Results: the agent-era layer is missing

What the most-visited sites actually expose to agents, against the published v0.2.0 scorecard and the in-flight v0.3.0-draft.

The signals search engines taught the web to ship are nearly everywhere: 87% expose a robots.txt and 63% a sitemap. But only 78% actually let the major AI crawlers in, so close to a fifth of the most-visited sites block them outright.

The signals agents need are not just rare, they are often a mirage. 29% ship an llms.txt, and most are real (97% are non-empty). The other agent files mostly answer with a 200 but not a usable document: 30% appear to have an AGENTS.md while only 0.2% carry the expected sections, and 24% answer at /sitemap.md while only 0.1% are a real structured sitemap.

The markdown mirror tells the same story, and the draft scorecard makes it sharper. 27% advertise one, but under v0.3.0-draft only 1.8% of those mirrors are actually markdown rather than HTML. And the page itself often needs a browser to read: v0.3.0-draft finds 74% serve real homepage content in the initial HTML, which leaves about a quarter as JavaScript shells that agents without a JS engine (Claude, Perplexity, OpenAI’s SearchBot) see as blank.

Signal	Sites	What’s behind the number	From
robots.txt allows AI bots	78%	22% block GPTBot, ClaudeBot, CCBot, or Google-Extended	`v0.2.0`
llms.txt	29%	97% of them are non-empty, a real file	`v0.2.0`
AGENTS.md	30%	only 0.2% carry the expected sections	`v0.2.0`
sitemap.md	24%	only 0.1% are a real structured sitemap	`v0.2.0`
Markdown mirror advertised	27%	4% serve it on request via content negotiation	`v0.2.0`
Mirror is valid markdown	1.8%	of advertised mirrors are actually markdown, not HTML	`v0.3.0-draft`
Homepage server-renders content	74%	26% are JavaScript shells, invisible to agents that don’t run JS	`v0.3.0-draft`
No consent interstitial	93%	7% gate content behind a wall an agent can’t click	`v0.3.0-draft`
JSON-LD structured data	37%	6% include a dateModified	`v0.2.0`
Glossary link	0.5%	effectively nobody	`v0.2.0`

Full reference: every check, by theme

Adoption for all 38 v0.2.0 checks, plus the 5 checks v0.3.0-draft adds. “Of applicable” excludes sites where a check does not apply (the markdown sub-checks, for example, only apply once a mirror exists). Three crawl-dependent checks (discovery.in-page-link, discovery.indexed, discovery.no-duplicate-content) need a multi-page crawl and were not measured by this single-page survey.

Discoverability	Of all	Of applicable
`sitemap-md.has-structure`	0%	0%
`llms-txt.md-extensions`	0%	0%
`agents-md.has-min-sections`	0%	0%
`llms-txt.content-type`	5%	19%
`sitemap-md.exists`	24%	n/a
`sitemap-xml.has-lastmod`	24%	50%
`llms-txt.non-empty`	28%	97%
`llms-txt.exists`	29%	n/a
`agents-md.exists`	30%	n/a
`sitemap-xml.valid`	47%	76%
`sitemap-xml.exists`	63%	n/a
`robots-txt.allows-ai-bots`	78%	n/a
`robots-txt.exists`	87%	n/a
`robots-txt.allows-llms-txt`	91%	n/a

Markdown mirror	Of all	Of applicable
`markdown.frontmatter`	0%	0%
`markdown.sitemap-section`	0%	0%
`markdown.alternate-link`	0%	0%
`markdown.canonical-header`	0%	0%
`markdown.content-negotiation`	4%	4%
`markdown.mirror-suffix`	27%	27%

Structured data	Of all	Of applicable
`html.json-ld.date-modified`	6%	17%
`html.json-ld.breadcrumb`	10%	27%
`html.json-ld`	37%	37%

Content structure	Of all	Of applicable
`html.glossary-link`	0%	0%
`html.text-ratio`	45%	45%
`html.headings`	61%	61%

HTML metadata	Of all	Of applicable
`html.og-description`	52%	52%
`html.canonical-link`	55%	55%
`html.og-title`	55%	55%
`html.meta-description`	65%	65%
`html.lang-attribute`	82%	82%

HTTP	Of all	Of applicable
`http.content-type-html`	83%	83%
`http.redirect-chain`	95%	n/a
`http.status-200`	97%	n/a
`http.no-noindex-noai`	100%	n/a

Added in v0.3.0-draft	Of all	Of applicable
`markdown.valid-markdown`	0%	2%
`markdown.size-reduction`	5%	17%
`markdown.navigation-stripped`	14%	52%
`html.ssr-content`	74%	74%
`http.no-interstitial`	93%	93%

And the fix works

The gap matters because closing it pays off. In a controlled A/B (full case study), serving the same site with the agent-readiness layer, versus without it, sharply cut what an AI agent spent to use it, with no loss in answer quality.

Metric	Result
a14y score	37 → 89
Agent tokens	−49%
Tool calls	−52%
Wall-clock	−30%
Answer quality	tied (84 vs 83)

Raising one site’s a14y score from 37 to 89 cut the agent’s token use about 49% and its tool calls about 52%, while an independent judge rated the answers statistically indistinguishable. Put the two findings together. The agent-readiness layer roughly halves what an agent spends to use a site, and 73% of the most-visited web hasn’t shipped it.

What it means

For site owners

Agent readiness is still a near-empty field. Roughly three in four popular sites haven’t shipped the layer, and none has it fully dialed in. That makes it a cheap, early-mover advantage: run npx a14y on your site, ship the top fixes, and roughly halve what every agent spends to use you.

For agent & model builders

The signals you’d want to lean on can’t be assumed yet. Only about 1 in 4 popular sites expose an llms.txt or AGENTS.md, so most of the time agents still fall back to the expensive path: fetching and parsing raw HTML. This dataset quantifies exactly where the gaps are.

For the a14y project

This is the baseline. We’ll re-run the survey on each scorecard release and track whether the web’s agent readiness moves: a standing measure of how the agent-readable web is (or isn’t) being built.

Caveats

What this survey does not show, and where to read the numbers with care.

Page mode. Each site is scored on its homepage only, a proxy for the site rather than a full-crawl verdict. Sites whose agent readiness lives in deeper sections are understated.
Point-in-time. A single snapshot (June 18, 2026). Sites change; this is a frame, not a film.
Popularity bias. CrUX skews toward large, heavily-trafficked sites; the long tail of the web likely looks different.
Reachable sites only. Origins that didn’t answer (DNS/TLS failures, bot or geo blocks) are dropped, so the sample is the reachable popular web.
Scorecard-specific. Scores are against v0.2.0’s 38 checks; a different scorecard shifts the absolute numbers (the leaderboard also carries a 0.3.0-draft scoring for comparison).
Filtered. Adult, gambling, and spam domains are removed by a heuristic denylist, which is imperfect.

Reproduce this study

Score any single site with the same tool the survey uses:

npx a14y https://example.com --scorecard 0.2.0

The survey itself is the same audit fanned out across the CrUX list. Batch provenance: crux batch-2026-06-15-100k, 50,074 sites, generated June 18, 2026.