--- title: "Agent Readability: A Specification for AI-Optimized Websites" description: "Agent readability is a set of best practices that make your website parseable, navigable, and citable by AI agents like ChatGPT, Claude, Cursor, and Copilot. A site-wide spec across discovery, structure, and context." doc_version: "1.0" last_updated: "2026-03-23" canonical: "https://timothyjordan.com/blog/2026/03/23/agent-readability-spec.html" --- # Agent Readability: A Specification for AI-Optimized Websites _Published March 23, 2026_ > Agent readability is a set of best practices that make your website parseable, navigable, and citable by AI agents like ChatGPT, Claude, Cursor, and Copilot. A site-wide spec across discovery, structure, and context. _[Cross-posted from Vercel KB](https://vercel.com/kb/guide/agent-readability-spec)_ AI agents — tools like ChatGPT, Claude, Copilot, and Cursor — are becoming a primary way people discover and consume web content. Agent readability is a set of best practices that make your website accessible to AI agents and assistants. It covers three areas: * **Discovery** — Can agents find your pages? (`llms.txt`, sitemaps, `robots.txt`) * **Structure** — Can agents parse your pages? (meta tags, headings, structured data, markdown mirrors) * **Context** — Can agents understand your content? (skill files, content negotiation, code documentation) This guide describes what to implement, why it matters, and how to verify each requirement. * * * ## Quick-Start Checklist ### Site-Level Files * Serve an `llms.txt` file at your site root * Allow AI bots in `robots.txt` * Publish a `sitemap.xml` with `` dates * Publish a `sitemap.md` with headings and links * Create an `AGENTS.md` with install, config, and usage sections * Ensure all pages are [discoverable](#page-discoverability) from at least one source ### Page-Level HTML * Return [HTTP 200](#http-response-basics) with 0–1 redirects * Set correct `Content-Type` headers * Do not set restrictive `x-robots-tag` values * Include a `` tag * Add `meta description` (50+ characters), `og:title`, `og:description`, and `html lang` * Add [Schema.org / JSON-LD](#schema.org-/-json-ld) structured data * Use [3+ section headings](#section-headings) (h1–h3) per page * Maintain a [text-to-HTML ratio](#signal-to-noise-ratio) above 15% * Include a [glossary or terminology link](#glossary-link) ### Server Configuration * Provide [markdown mirrors](#markdown-mirrors-and-content-negotiation) for HTML pages * Add `` to HTML pages * Return `Link` header with `rel="canonical"` from markdown endpoints * Support `Accept: text/markdown` content negotiation * Include a `## Sitemap` section in markdown pages ### Content Quality * [Fence all code blocks](#code-blocks) with language identifiers * Link to [OpenAPI/Swagger schemas](#open-api-schema) from API reference pages * * * ## Scoring Your agent readability score measures how well your site meets these requirements: `score = round((passed checks / total checks) x 100)` Only checks with a **pass** status count toward the numerator. Checks that **fail**, **warn**, or **error** do not. The total is the sum of all site-wide checks plus all per-page checks across every discovered page. Because per-page checks run on every page, sites with many pages have a larger denominator — a single failing check matters less on a large site, but a systemic issue (such as missing canonical links on every page) compounds significantly. | Score | Rating | Meaning | | ------ | ----------------- | --------------------------------------------------------- | | 90–100 | Excellent | Highly optimized for AI agents. All critical checks pass. | | 70–89 | Good | Meets most requirements. Address warnings to improve. | | 50–69 | Fair | Has gaps. Review failed checks and implement fixes. | | 0–49 | Needs Improvement | Significant work needed across multiple areas. | * * * ## Site-Level Requirements These requirements apply once per site, at the root level. ### llms.txt **What:** Serve an `llms.txt` file that lists your documentation pages. This is the primary entry point for AI agents discovering your content. **Why:** AI agents look for `llms.txt` as a machine-readable index of your site's content, similar to how search engines use `sitemap.xml`. Without it, agents must crawl your site to find pages, which is slower and less reliable. **Requirements:** * Serve the file at one of: `/llms.txt`, `/.well-known/llms.txt`, or `/docs/llms.txt` * Alternatively, serve `llms-full.txt` at the same paths * Use `text/plain` as the `Content-Type` * The file must not be empty * Listed URLs should use `.md` or `.mdx` extensions, not `.html` **Example:** ```markdown # Example Product Documentation ## Getting Started - [Installation](/docs/installation.md) - [Quick Start](/docs/quick-start.md) - [Configuration](/docs/configuration.md) ## API Reference - [Authentication](/docs/api/auth.md) - [Endpoints](/docs/api/endpoints.md) - [Error Handling](/docs/api/errors.md) ## Guides - [Deployment](/docs/guides/deployment.md) - [Monitoring](/docs/guides/monitoring.md) ``` **How to verify:** ```bash curl -I # Should return 200 with Content-Type: text/plain ``` ### robots.txt **What:** Ensure your `robots.txt` does not block known AI bots. **Why:** AI agents respect `robots.txt` directives. If you block them, your content will not be indexed or cited by AI assistants. **Requirements:** * Do not block `GPTBot`, `ClaudeBot`, `CCBot`, or `Google-Extended` * Do not disallow `/llms.txt` * Having no `robots.txt` at all triggers a warning — it is better to explicitly allow access **Example:** ```text User-agent: * Allow: / User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / User-agent: CCBot Allow: / User-agent: Google-Extended Allow: / Sitemap: ``` **How to verify:** ```bash curl # Inspect output for Disallow rules targeting AI bots or /llms.txt ``` ### Sitemap (XML and Markdown) **What:** Publish both a `sitemap.xml` and a `sitemap.md` to help agents understand your site structure. **Why:** XML sitemaps are the standard for search engine crawlers. Markdown sitemaps give AI agents a structured, readable overview of your documentation hierarchy. Publishing both maximizes discoverability. **Requirements for** `sitemap.xml`**:** * Serve a valid XML sitemap with `` or `` containing `` entries * Include `` dates so agents know which pages have changed **Requirements for** `sitemap.md`**:** * Serve at one of: `/sitemap.md`, `/docs/sitemap.md`, or `/.well-known/sitemap.md` * Include headings and links that reflect your site's structure **Example** `sitemap.md`**:** ```markdown # Sitemap ## Getting Started - [Installation](/docs/installation.md) - [Quick Start](/docs/quick-start.md) ## API Reference - [Authentication](/docs/api/auth.md) - [Endpoints](/docs/api/endpoints.md) ## Guides - [Deployment](/docs/guides/deployment.md) - [Monitoring](/docs/guides/monitoring.md) ``` **How to verify:** ```bash curl | head -20 # Should contain with and entries curl # Should return structured markdown with headings and links ``` ### Skill File ([AGENTS.md](http://agents.md/)) **What:** Create an `AGENTS.md` file that gives coding agents direct context about your product — how to install it, configure it, and use it. **Why:** Coding agents like Copilot, Claude Code, and Cursor use skill files to understand how to work with your product. A well-written skill file means agents can generate correct code for your users without guessing. **Requirements:** * Serve the file at one of: `/AGENTS.md`, `/agents.md`, `/.well-known/agents.md`, `/docs/AGENTS.md`, `/llms-full.txt`, `/CLAUDE.md`, `/.cursor/rules`, or `/.cursorrules` * Include at least 2 of the following sections: installation instructions, configuration details, usage examples or code blocks **Example** `AGENTS.md`**:** ````markdown # My Product ## Installation ```bash npm install my-product ``` ```` ## Configuration Create a `my-product.config.ts` file in your project root: ```typescript import { defineConfig } from 'my-product'; export default defineConfig({ apiKey: process.env.MY_PRODUCT_API_KEY, region: 'us-east-1', }); ``` ## Usage ```typescript import { createClient } from 'my-product'; const client = createClient(); const result = await client.query('SELECT * FROM users'); console.log(result.rows); ``` **How to verify:** ```bash curl # Should return markdown with install, config, and/or usage sections ``` ### Page Discoverability **What:** Ensure every page on your site is reachable from at least one discovery source. **Why:** Pages that are not linked from `sitemap.xml`, `llms.txt`, `sitemap.md`, or other pages cannot be found by agents. Orphaned pages are invisible to AI. **Requirements:** * Every page should appear in at least one of: `sitemap.xml`, `llms.txt`, `sitemap.md`, or be reachable via links from other discoverable pages **How to verify:** Cross-reference your page count against the URLs listed in your sitemaps and `llms.txt`. Any page not present in any source is undiscoverable to agents. * * * ## Page-Level Requirements These requirements apply to every page on your site. ### HTTP Response Basics **What:** Ensure pages return clean HTTP responses that agents can process without issues. **Why:** Agents follow redirects and inspect headers to decide whether to index a page. Broken responses, long redirect chains, or restrictive headers cause agents to skip your content. **Requirements:** * Return HTTP 200 for all live pages * Limit redirect chains to 0–1 hops (2+ redirects cause failures) * Set the correct `Content-Type` header: * HTML pages: `text/html;charset=UTF-8` * Markdown pages: `text/plain;charset=UTF-8` * Do not include `noindex`, `noai`, or `noimageai` in the `x-robots-tag` response header **How to verify:** ```bash curl -I -L # Check: HTTP status is 200, no more than 1 redirect, correct Content-Type # Check: x-robots-tag does not contain noindex, noai, or noimageai ``` ### HTML Meta and Structure **What:** Include proper metadata, structured data, and heading hierarchy so agents can understand each page's content and context. **Why:** Meta tags tell agents what a page is about before they read the full content. [Schema.org](http://schema.org/) structured data provides machine-readable context like authorship, dates, and breadcrumbs. Headings create a scannable structure that agents use to extract sections relevant to a user's query. A high text-to-HTML ratio ensures the page contains real content rather than framework boilerplate. **Requirements:** #### Canonical link Include `` on every page to tell agents which URL is authoritative. #### Meta tags Include all of the following: * `` (at least 50 characters) * `` * `` * `lang` attribute on the `` element #### [Schema.org](http://schema.org/) / JSON-LD Include a ` ``` #### Section headings Use 3 or more headings (h1–h3) per page to create a clear structure. Well-structured pages produce better embeddings and allow agents to extract specific sections. #### Signal-to-noise ratio Maintain a text-to-HTML ratio above 15%. Pages dominated by JavaScript bundles, framework boilerplate, or empty wrappers are harder for agents to parse. #### Glossary link Include a link to a glossary or terminology page. This helps agents resolve ambiguous terms in your content. **How to verify:** ```bash curl -s | grep -E 'canonical|og:title|og:description|ld\\\\+json|` tag pointing to the markdown version: ```html ``` **Canonical link in markdown responses** — When serving markdown files, include a `Link` HTTP header: ```text Link: ; rel="canonical" ``` **Content negotiation** — Return markdown when the client sends an `Accept: text/markdown` header: ```bash curl -H "Accept: text/markdown" # Should return markdown content with Content-Type: text/markdown ``` **Sitemap section** — Include a `## Sitemap` heading in each markdown page with a link to `/sitemap.md`: ```markdown ## Sitemap See the full [sitemap](/sitemap.md) for all pages. ``` **How to verify:** ```bash # Check alternate link curl -s | grep 'rel="alternate".*text/markdown' # Check content negotiation curl -H "Accept: text/markdown" -I # Content-Type should be text/markdown # Check markdown mirror directly curl # Should return markdown with frontmatter ``` ### Code and API Documentation **What:** Fence all code blocks with language identifiers and link to machine-readable API schemas. **Why:** Language-tagged code blocks let agents generate syntactically correct examples. API schema links (OpenAPI, Swagger) give agents the full contract of your API, enabling them to write integration code without guessing endpoints or parameters. **Requirements:** #### Code blocks Every `

` block should have a `language-*` or `lang-*` class:

```html


const client = createClient();




const client = createClient();

```

In markdown, always specify the language after the opening fence:

````
```typescript
const client = createClient();
```
````

#### Open API schema

**API schema links**
— On pages with API documentation (URLs containing `/api/`, `/reference/`, `/endpoints/`, `/swagger/`, or `/openapi/`), include links to your machine-readable schema files (`openapi.json`, `swagger.json`, `swagger.yaml`, or `schema.json`).


**How to verify:**
```bash
# Check code blocks have language classes
curl -s  | grep -E '