citare
AI crawler hygiene

llms.txt audit

Validate your llms.txt against the spec, check every URL it points to, and benchmark against the brands that should know better. We audited Ahrefs, Profound, and Semrush in May 2026 — all three return a 404 on their own llms.txt. Don't be them.

What the audit checks

File present + parseable

Fetches /llms.txt with the GPTBot/ClaudeBot user agents. Confirms 200 status, correct MIME type, and parseable Markdown structure with H1 brand line + sectioned H2s.

Linked URLs resolve

Every URL inside the file is fetched and validated. We flag 404s, redirects, soft-404s, and pages whose content doesn't match the link label. Stale llms.txt files are worse than no file.

Coverage benchmark

Compares your linked URL set against your sitemap.xml. Surfaces the high-traffic pages you forgot to include. Flags the low-value pages bloating the file.

The 12-point llms.txt checklist

  • File served at /llms.txt with 200 status and text/markdown MIME type.
  • H1 brand line at the top — single line, no markdown noise.
  • Optional blockquote summary directly under the H1 explaining what the brand does.
  • H2 sections for Product, Docs, Pricing, Blog — not every page; only the canonical ones.
  • Every link uses an absolute URL with the canonical host (no relative paths, no parameter junk).
  • No URLs that 301 elsewhere — link directly to the final destination.
  • Every linked URL returns 200 to GPTBot, ClaudeBot, and PerplexityBot user agents.
  • Content on each linked page matches the link label (no bait-and-switch).
  • File length under 200 lines — concise wins over comprehensive.
  • robots.txt does not block GPTBot/ClaudeBot/PerplexityBot from /llms.txt itself.
  • llms-full.txt published separately if you want full-content inlining for long-form docs.
  • File is regenerated automatically on every deploy — never let it go stale.

Frequently asked

What is an llms.txt audit?

An llms.txt audit checks whether your site exposes a valid llms.txt file at the root path, whether the file conforms to the spec (H1 brand line, optional blockquote summary, then sectioned H2 link lists), and whether the URLs it points to actually return 200s with the content the file claims they contain. A passing audit means an AI crawler can land on your llms.txt and get a curated, machine-friendly map of your most important pages.

How do I audit my llms.txt automatically?

Use our free llms.txt validator at /tools/llms-txt-validator — paste your domain, it fetches /llms.txt, parses the structure, validates every linked URL, and returns a per-line report (valid / warning / error). No signup. Good for spot checks before publishing or after a content update.

Do the big AI visibility tools have valid llms.txt files?

No. We audited Ahrefs, Profound, and Semrush in May 2026. All three return 404 on /llms.txt. They sell AI search visibility products while failing the most basic AI crawler hygiene signal. Read the full breakdown at /guides/ahrefs-profound-semrush-llms-txt-audit.

Does Google or OpenAI actually read llms.txt?

Google has publicly declined to commit to the standard. OpenAI's GPTBot, Anthropic's ClaudeBot, and Perplexity's bot all respect llms.txt where present. Until Google moves, llms.txt is an opt-in signal for the AI-native crawlers, not a universal directive. It still wins surface rate on ChatGPT, Claude, and Perplexity — three of the five major AI surfaces we track.

What does a passing llms.txt look like?

Start with an H1 brand line, optional blockquote summary, then H2 sections (Docs, Product, Pricing, Blog) with bulleted links to your highest-value canonical URLs. Skip junk pages, tag archives, and parameterised URLs. Keep it under 200 lines. Audit at /tools/llms-txt-validator after every major site change.

Audit your llms.txt free

No signup. Paste your domain, get a per-line report in under 10 seconds. Re-run after every deploy.