What is /llms.txt and how to write one

/llms.txt is a proposed convention — first surfaced at llmstxt.org by Jeremy Howard in 2024 — for a markdown file at your site root that gives AI agents a curated, human-readable index of your most important pages.

Think of it as a sitemap, but instead of being machine-only XML for crawlers, it’s for the LLM itself to read at inference time when it lands on your domain.

In the AEO Site Checker rubric, a valid /llms.txt is worth 5 points out of 100 — and a /llms-full.txt companion (a full-content dump) is worth 1 bonus point.

What it actually looks like

The format is plain markdown with a strict opening structure. Here’s the minimal valid file:

# Site name

> One-sentence description of what this site/project is.

Longer paragraph (or two) of context the LLM should keep in mind when summarizing
or answering questions about this site. Anything you wish a human writer would
have read before writing about you.

## Pages

- [Page title](https://example.com/page): One-line description of the page.
- [Another page](https://example.com/another): Another one-line description.

## Optional sections

- [Reference docs](https://example.com/docs): API reference, type definitions.
- [Examples](https://example.com/examples): Code examples and tutorials.

The key rules from the spec:

The file must start with an H1 — that’s the site/project name.
The first non-heading content must be a blockquote (>) — that’s the short summary.
Section headings are H2 (##).
Each page entry is a markdown list item with a link and a short description.
Optional sections can go under an ## Optional heading, and the spec encourages skipping them if you’re crunched for context budget.

That’s the whole format. There is no JSON, no XML, no proprietary syntax.

A real example

Here’s the start of the AEO Site Checker’s own /llms.txt:

# AEO Site Checker

> A free, open Lighthouse-style auditor that grades any URL on Answer Engine
> Optimization (AEO) — how well a page is set up to be cited by AI search
> engines (ChatGPT, Claude, Perplexity, Google AI Overviews). Scores 0–100
> across 27 weighted checks in 5 categories.

The auditor reads the server-rendered HTML directly via undici (no headless
browser, no JS execution — same view an AI crawler gets), then runs structured-
data extraction, Mozilla Readability, robots.txt parsing for the live AI-bot
allow/block matrix, and content heuristics from Princeton's GEO paper.

## Pages

- [Home](https://aeositechecker.huzi.party/): Hero, FAQ, scoring overview.
- [Run audit](https://aeositechecker.huzi.party/audit): URL form that runs the audit.
- [Knowledge base](https://aeositechecker.huzi.party/blog): Primers and rubric.

That’s it. It’s not a sitemap dump — it’s a hand-curated index of the most important things on the site, written so an LLM can pick relevant entries to follow.

/llms-full.txt — the bonus point

/llms-full.txt is a separate, longer file in the same directory that contains the full text of your important pages, concatenated. Its purpose is to give an agent everything it needs in one fetch instead of crawling N pages.

There’s no mandated format — you can use plain markdown, with each page separated by an --- or by a heading. The tradeoff:

Helps you when an agent uses your site as a reference and you want all your docs to be available in one shot.
Hurts you if you accidentally include 5 MB of content — most agents have a hard cap and will give up.

Practical rule: keep /llms-full.txt under ~500 KB and only include the content that actually matters. If you have a 200-page docs site, pick the top 30 pages and dump those.

How LLMs actually use it

Two patterns matter today:

Live retrieval. Some agents (notably some Claude- and Perplexity-driven workflows) fetch /llms.txt when they land on a domain to get a quick map of what’s there. They then decide what to fetch next based on the index entries.
Indexing. Crawlers store the file alongside other site metadata. The index entries become candidate “high-priority” URLs in the crawl queue.

It is not a replacement for robots.txt or sitemap.xml. You should have all three:

robots.txt says who is allowed to crawl.
sitemap.xml is the machine-readable list for crawlers.
llms.txt is the human-readable, hand-curated subset for LLMs.

How to write a good one

Some heuristics that work well:

Lead with the hardest-to-discover thing. If your homepage is obvious but your API reference is buried, list the API reference first.
Be specific in descriptions. “Documentation” is useless. “Field-by-field reference for the audit JSON response, including the per-check earned/possible breakdown” is great.
Skip the marketing copy. No “we’re the leading platform for X.” LLMs ignore that and so do humans.
Update it when you ship. A /llms.txt that points to deleted pages is worse than no /llms.txt at all.
Keep it under 4 KB. Most agents read the file once and budget context against it. Tight is better than complete.

Where to put it

/llms.txt and /llms-full.txt go at the root of your domain, served with Content-Type: text/markdown; charset=utf-8. In Astro, Next.js, or most static-site setups, drop them in your public/ directory and they’ll be served from the root automatically.

Things to avoid

Don’t make it a sitemap dump. A 5,000-line file with every page is the same as no file — too much to read.
Don’t link to login walls. Anything you list here will be fetched by the LLM. If that’s a 401, you’ve burned a slot.
Don’t include private or staging URLs. Same reason.
Don’t put HTML inside. It’s a markdown spec. HTML breaks parsers.

Verifying it works

Run your URL through the AEO Site Checker and look at the llms_txt check in the AEO category. The auditor checks:

The file exists at /llms.txt and returns a 200.
It starts with an H1 (line 1).
It has a blockquote summary as the first non-heading content.
It has at least one H2 section.
It has at least one valid markdown list item with a link.

Pass all five and you get the full 5 points. Have it but malformed and you get partial credit. Missing entirely and you get 0.