# AEO Site Checker — full reference
> A free Lighthouse-style auditor that grades any URL on **Answer Engine Optimization** — how well a page is set up to be cited by AI search engines (ChatGPT, Claude, Perplexity, Google AI Overviews). Scores 0–100 across 27 weighted checks in five categories.
This document is a self-contained reference. It describes every check the auditor runs, the exact weight of each, the API surface, and the limitations. AI agents and humans alike should be able to read this end-to-end without needing to clone the repository.
Public URL: https://aeositechecker.huzi.party
---
## What "AEO" means in this tool
When somebody asks ChatGPT (or Claude, Perplexity, or Google AI Overviews) for a recommendation, the response includes one to three cited sources. The model picks those citations from a different mix of signals than Google's PageRank-era SEO checklist:
- **Crawler reachability.** The model has to be able to fetch the page in the first place. Cloudflare bot challenges, 403 walls, and JS-only single-page apps are the most common reasons a site is invisible in AI answers despite ranking on Google.
- **Permission to crawl.** The AI bot has to be allowed in `robots.txt`. Many sites accidentally block `OAI-SearchBot`, `Claude-User`, or `PerplexityBot` while leaving Googlebot allowed.
- **Parseable structure.** Semantic HTML, JSON-LD, and Mozilla-Readability-friendly article structure are easier for an LLM to extract from.
- **The `/llms.txt` convention.** A small markdown file at the site root that gives an AI agent a curated index of important pages — like `sitemap.xml` but optimized for LLMs to read instead of crawlers to follow.
- **Content shape.** Princeton's GEO paper measured what actually moves the needle in generative answers — front-loaded direct answers, statistics density (+41% citation lift), quotations (+28%), citations to authorities. These are heuristic but directionally correct.
The tool produces a single 0–100 score and a letter grade so you can track changes over time.
---
## Scoring rubric: 100 points across 27 checks in 5 categories (+1 type-conditional)
Letter grade: **A** ≥ 90 · **B** ≥ 80 · **C** ≥ 70 · **D** ≥ 60 · **F** otherwise.
### Fetchability — 43 points
| Check ID | Weight | What it measures |
|---|---:|---|
| `fetch_direct` | 18 | Direct (no-proxy) fetch returns 2xx/3xx with no bot challenge. This is the heaviest single check because it's the one an LLM crawler has to pass too. |
| `https` | 4 | URL uses HTTPS. |
| `page_size` | 3 | HTML body is under ~2.5 MB. |
| `robots_ai_bots` | 10 | `robots.txt` allows the critical AI bots: `ChatGPT-User`, `OAI-SearchBot`, `Claude-User`, `Claude-SearchBot`, `PerplexityBot`, `Perplexity-User`, `Google-Extended`. Each blocked critical bot costs 3 points. |
| `ssr_content` | 8 | Body has at least 50 words of server-rendered text (not a JS-only `
` shell). |
### Core SEO — 21 points
| Check ID | Weight | What it measures |
|---|---:|---|
| `seo_title` | 4 | `` between 25 and 65 characters. Half credit for any non-empty title. |
| `seo_meta_description` | 4 | `` between 80 and 175 characters. Half credit for any non-empty description. |
| `seo_canonical` | 3 | `` present. |
| `seo_opengraph` | 3 | `og:title`, `og:description`, `og:image`. One point per tag, max 3. |
| `seo_twitter_card` | 2 | `` present. |
| `seo_lang` | 1 | `` set. |
| `sitemap` | 4 | `sitemap.xml` (or `sitemap-index.xml`, `sitemap_index.xml`, `sitemap-0.xml`) reachable and well-formed. Full credit only if also referenced in `robots.txt`. |
### Semantic HTML — 13 points
| Check ID | Weight | What it measures |
|---|---:|---|
| `semantic_h1` | 3 | Exactly one `
`. Partial credit if more than one. |
| `semantic_heading_hierarchy` | 3 | Headings progress by at most one level (no `
` followed by `
`). |
| `semantic_landmarks` | 4 | Page has `` plus (`` or `