Agent Readability: A Specification for AI-Optimized Websites

Cross-posted from Vercel KB

AI agents — tools like ChatGPT, Claude, Copilot, and Cursor — are becoming a primary way people discover and consume web content.

Agent readability is a set of best practices that make your website accessible to AI agents and assistants. It covers three areas:

Discovery — Can agents find your pages? (llms.txt, sitemaps, robots.txt)
Structure — Can agents parse your pages? (meta tags, headings, structured data, markdown mirrors)
Context — Can agents understand your content? (skill files, content negotiation, code documentation)

This guide describes what to implement, why it matters, and how to verify each requirement.

Quick-Start Checklist

Site-Level Files

Serve an llms.txt file at your site root
Allow AI bots in robots.txt
Publish a sitemap.xml with <lastmod> dates
Publish a sitemap.md with headings and links
Create an AGENTS.md with install, config, and usage sections
Ensure all pages are discoverable from at least one source

Page-Level HTML

Return HTTP 200 with 0–1 redirects
Set correct Content-Type headers
Do not set restrictive x-robots-tag values
Include a <link rel="canonical"> tag
Add meta description (50+ characters), og:title, og:description, and html lang
Add Schema.org / JSON-LD structured data
Use 3+ section headings (h1–h3) per page
Maintain a text-to-HTML ratio above 15%
Include a glossary or terminology link

Server Configuration

Provide markdown mirrors for HTML pages
Add <link rel="alternate" type="text/markdown"> to HTML pages
Return Link header with rel="canonical" from markdown endpoints
Support Accept: text/markdown content negotiation
Include a ## Sitemap section in markdown pages

Content Quality

Fence all code blocks with language identifiers
Link to OpenAPI/Swagger schemas from API reference pages

Scoring

Your agent readability score measures how well your site meets these requirements:

score = round((passed checks / total checks) x 100)

Only checks with a pass status count toward the numerator. Checks that fail, warn, or error do not.

The total is the sum of all site-wide checks plus all per-page checks across every discovered page. Because per-page checks run on every page, sites with many pages have a larger denominator — a single failing check matters less on a large site, but a systemic issue (such as missing canonical links on every page) compounds significantly.

Score	Rating	Meaning
90–100	Excellent	Highly optimized for AI agents. All critical checks pass.
70–89	Good	Meets most requirements. Address warnings to improve.
50–69	Fair	Has gaps. Review failed checks and implement fixes.
0–49	Needs Improvement	Significant work needed across multiple areas.

Site-Level Requirements

These requirements apply once per site, at the root level.

llms.txt

What: Serve an llms.txt file that lists your documentation pages. This is the primary entry point for AI agents discovering your content.

Why: AI agents look for llms.txt as a machine-readable index of your site’s content, similar to how search engines use sitemap.xml. Without it, agents must crawl your site to find pages, which is slower and less reliable.

Requirements:

Serve the file at one of: /llms.txt, /.well-known/llms.txt, or /docs/llms.txt
Alternatively, serve llms-full.txt at the same paths
Use text/plain as the Content-Type
The file must not be empty
Listed URLs should use .md or .mdx extensions, not .html

Example:

# Example Product Documentation

## Getting Started
- [Installation](/docs/installation.md)
- [Quick Start](/docs/quick-start.md)
- [Configuration](/docs/configuration.md)

## API Reference
- [Authentication](/docs/api/auth.md)
- [Endpoints](/docs/api/endpoints.md)
- [Error Handling](/docs/api/errors.md)

## Guides
- [Deployment](/docs/guides/deployment.md)
- [Monitoring](/docs/guides/monitoring.md)

How to verify:

curl -I <https://example.com/llms.txt>
# Should return 200 with Content-Type: text/plain

robots.txt

What: Ensure your robots.txt does not block known AI bots.

Why: AI agents respect robots.txt directives. If you block them, your content will not be indexed or cited by AI assistants.

Requirements:

Do not block GPTBot, ClaudeBot, CCBot, or Google-Extended
Do not disallow /llms.txt
Having no robots.txt at all triggers a warning — it is better to explicitly allow access

Example:

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: CCBot
Allow: /

User-agent: Google-Extended
Allow: /

Sitemap: <https://example.com/sitemap.xml>

How to verify:

curl <https://example.com/robots.txt>
# Inspect output for Disallow rules targeting AI bots or /llms.txt

Sitemap (XML and Markdown)

What: Publish both a sitemap.xml and a sitemap.md to help agents understand your site structure.

Why: XML sitemaps are the standard for search engine crawlers. Markdown sitemaps give AI agents a structured, readable overview of your documentation hierarchy. Publishing both maximizes discoverability.

Requirements for sitemap.xml:

Serve a valid XML sitemap with <urlset> or <sitemapindex> containing <loc> entries
Include <lastmod> dates so agents know which pages have changed

Requirements for sitemap.md:

Serve at one of: /sitemap.md, /docs/sitemap.md, or /.well-known/sitemap.md
Include headings and links that reflect your site’s structure

Example sitemap.md:

# Sitemap

## Getting Started
- [Installation](/docs/installation.md)
- [Quick Start](/docs/quick-start.md)

## API Reference
- [Authentication](/docs/api/auth.md)
- [Endpoints](/docs/api/endpoints.md)

## Guides
- [Deployment](/docs/guides/deployment.md)
- [Monitoring](/docs/guides/monitoring.md)

How to verify:

curl <https://example.com/sitemap.xml> | head -20
# Should contain <urlset> with <loc> and <lastmod> entries

curl <https://example.com/sitemap.md>
# Should return structured markdown with headings and links

Skill File (AGENTS.md)

What: Create an AGENTS.md file that gives coding agents direct context about your product — how to install it, configure it, and use it.

Why: Coding agents like Copilot, Claude Code, and Cursor use skill files to understand how to work with your product. A well-written skill file means agents can generate correct code for your users without guessing.

Requirements:

Serve the file at one of: /AGENTS.md, /agents.md, /.well-known/agents.md, /docs/AGENTS.md, /llms-full.txt, /CLAUDE.md, /.cursor/rules, or /.cursorrules
Include at least 2 of the following sections: installation instructions, configuration details, usage examples or code blocks

Example AGENTS.md:

# My Product

## Installation

```bash
npm install my-product
```

Configuration

Create a my-product.config.ts file in your project root:

import { defineConfig } from 'my-product';
export default defineConfig({
  apiKey: process.env.MY_PRODUCT_API_KEY,
  region: 'us-east-1',
});

Usage

import { createClient } from 'my-product';

const client = createClient();
const result = await client.query('SELECT * FROM users');
console.log(result.rows);

How to verify:

curl <https://example.com/AGENTS.md>
# Should return markdown with install, config, and/or usage sections

Page Discoverability

What: Ensure every page on your site is reachable from at least one discovery source.

Why: Pages that are not linked from sitemap.xml, llms.txt, sitemap.md, or other pages cannot be found by agents. Orphaned pages are invisible to AI.

Requirements:

Every page should appear in at least one of: sitemap.xml, llms.txt, sitemap.md, or be reachable via links from other discoverable pages

How to verify: Cross-reference your page count against the URLs listed in your sitemaps and llms.txt. Any page not present in any source is undiscoverable to agents.

Page-Level Requirements

These requirements apply to every page on your site.

HTTP Response Basics

What: Ensure pages return clean HTTP responses that agents can process without issues.

Why: Agents follow redirects and inspect headers to decide whether to index a page. Broken responses, long redirect chains, or restrictive headers cause agents to skip your content.

Requirements:

Return HTTP 200 for all live pages
Limit redirect chains to 0–1 hops (2+ redirects cause failures)
Set the correct Content-Type header:
- HTML pages: text/html;charset=UTF-8
- Markdown pages: text/plain;charset=UTF-8
Do not include noindex, noai, or noimageai in the x-robots-tag response header

How to verify:

curl -I -L <https://example.com/docs/getting-started>
# Check: HTTP status is 200, no more than 1 redirect, correct Content-Type
# Check: x-robots-tag does not contain noindex, noai, or noimageai

HTML Meta and Structure

What: Include proper metadata, structured data, and heading hierarchy so agents can understand each page’s content and context.

Why: Meta tags tell agents what a page is about before they read the full content. Schema.org structured data provides machine-readable context like authorship, dates, and breadcrumbs. Headings create a scannable structure that agents use to extract sections relevant to a user’s query. A high text-to-HTML ratio ensures the page contains real content rather than framework boilerplate.

Requirements:

Canonical link

Include <link rel="canonical" href="..."> on every page to tell agents which URL is authoritative.

Meta tags

Include all of the following:

<meta name="description" content="..."> (at least 50 characters)
<meta property="og:title" content="...">
<meta property="og:description" content="...">
lang attribute on the <html> element

Schema.org / JSON-LD

Include a <script type="application/ld+json"> block with at minimum: title, description, canonical URL, dateModified, and BreadcrumbList.

Example JSON-LD:

<script type="application/ld+json">
{
  "@context": "<https://schema.org>",
  "@type": "TechArticle",
  "headline": "Getting Started with My Product",
  "description": "Learn how to install, configure, and use My Product in your application.",
  "url": "<https://example.com/docs/getting-started>",
  "dateModified": "2025-01-15T10:00:00Z",
  "breadcrumb": {
    "@type": "BreadcrumbList",
    "itemListElement": [
      {
        "@type": "ListItem",
        "position": 1,
        "name": "Docs",
        "item": "<https://example.com/docs>"
      }, {
        "@type": "ListItem",
        "position": 2,
        "name": "Getting Started",
        "item": "<https://example.com/docs/getting-started>"
      }
    ]
  }
}
</script>

Section headings

Use 3 or more headings (h1–h3) per page to create a clear structure. Well-structured pages produce better embeddings and allow agents to extract specific sections.

Signal-to-noise ratio

Maintain a text-to-HTML ratio above 15%. Pages dominated by JavaScript bundles, framework boilerplate, or empty wrappers are harder for agents to parse.

Glossary link

Include a link to a glossary or terminology page. This helps agents resolve ambiguous terms in your content.

How to verify:

curl -s <https://example.com/docs/getting-started> | grep -E 'canonical|og:title|og:description|ld\\\\+json|<meta name="description"'
# Each should return a match

Markdown Mirrors and Content Negotiation

What: Provide markdown versions of your HTML pages and support content negotiation so agents can request the format they prefer.

Why: AI agents work natively with markdown. Raw HTML requires parsing and stripping away navigation, headers, footers, and scripts. A markdown mirror gives agents clean, structured content they can process directly — resulting in more accurate citations and better answers.

Requirements:

Markdown mirrors — For every HTML page, provide a corresponding .md or .mdx version. Include frontmatter metadata:

---
title: Getting Started
description: Learn how to install and configure My Product
doc_version: "2.1"
last_updated: "2025-01-15"
---

Alternate link in HTML — Add a <link> tag pointing to the markdown version:

<link rel="alternate" type="text/markdown" href="/docs/getting-started.md">

Canonical link in markdown responses — When serving markdown files, include a Link HTTP header:

Link: <https://example.com/docs/getting-started>; rel="canonical"

Content negotiation — Return markdown when the client sends an Accept: text/markdown header:

curl -H "Accept: text/markdown" <https://example.com/docs/getting-started>
# Should return markdown content with Content-Type: text/markdown

Sitemap section — Include a ## Sitemap heading in each markdown page with a link to /sitemap.md:

## Sitemap

See the full [sitemap](/sitemap.md) for all pages.

How to verify:

# Check alternate link
curl -s <https://example.com/docs/getting-started> | grep 'rel="alternate".*text/markdown'

# Check content negotiation
curl -H "Accept: text/markdown" -I <https://example.com/docs/getting-started>
# Content-Type should be text/markdown

# Check markdown mirror directly
curl <https://example.com/docs/getting-started.md>
# Should return markdown with frontmatter

Code and API Documentation

What: Fence all code blocks with language identifiers and link to machine-readable API schemas.

Why: Language-tagged code blocks let agents generate syntactically correct examples. API schema links (OpenAPI, Swagger) give agents the full contract of your API, enabling them to write integration code without guessing endpoints or parameters.

Requirements:

Code blocks

Every <pre><code> block should have a language-* or lang-* class:

<!-- Good -->
<pre><code class="language-typescript">
const client = createClient();
</code></pre>

<!-- Bad: no language specified -->
<pre><code>
const client = createClient();
</code></pre>

In markdown, always specify the language after the opening fence:

```typescript
const client = createClient();
```

Open API schema

API schema links — On pages with API documentation (URLs containing /api/, /reference/, /endpoints/, /swagger/, or /openapi/), include links to your machine-readable schema files (openapi.json, swagger.json, swagger.yaml, or schema.json).

How to verify:

# Check code blocks have language classes
curl -s <https://example.com/docs/api/endpoints> | grep -E '<code class="(language|lang)-'

# Check API schema links
curl -s <https://example.com/docs/api/reference> | grep -E 'openapi\\\\.json|swagger\\\\.(json|yaml)|schema\\\\.json'

Optional: AI Text Style Analysis

For technical documentation sites, an AI-powered analysis can evaluate your content against documentation best practices:

Voice and tone — Is the writing clear, direct, and consistent?
Formatting — Are lists, tables, and code examples used effectively?
Examples — Does the documentation include runnable code samples?
Units and precision — Are measurements, limits, and thresholds clearly stated?
Pricing clarity — If applicable, is pricing information unambiguous?

This analysis is opt-in and requires an AI model to evaluate your content. It is most useful for sites with extensive technical documentation where writing quality directly impacts whether agents can extract accurate information.

Quick-Start Checklist

Site-Level Files

Page-Level HTML

Server Configuration

Content Quality

Scoring

Site-Level Requirements

llms.txt

robots.txt

Sitemap (XML and Markdown)

Skill File (AGENTS.md)

Configuration

Usage

Page Discoverability

Page-Level Requirements

HTTP Response Basics

HTML Meta and Structure

Canonical link

Meta tags

Schema.org / JSON-LD

Section headings

Signal-to-noise ratio

Glossary link

Markdown Mirrors and Content Negotiation

Code and API Documentation

Code blocks

Open API schema

Optional: AI Text Style Analysis

Further Reading