Citare Tools · Free

Robots.txt Generator.

Build a paste-ready robots.txtwith per-bot allow/block decisions for GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and 10+ other AI crawlers. Three opinionated presets if you don’t want to think about it.

Free. No signup. Pure client-side. Live preview, copy or download.

Quick start

AI bots

Per-bot allow / block / skip. Skip = no rule for that bot (default behavior is allow).

OpenAI

GPTBot

Crawls public content for GPT model training. Blocking opts you out of training only.

OAI-SearchBot

Indexes content for ChatGPT's live web search. Block this and you disappear from ChatGPT citations.

ChatGPT-User

Fetches a URL when a ChatGPT user clicks a link or a custom GPT browses the web.

Anthropic

ClaudeBot

Anthropic's training crawler for Claude.

Claude-User

Fetches a URL on behalf of a Claude user during a conversation.

Google AI

Google-Extended

Controls Gemini and Vertex AI training opt-in. Live Gemini grounding uses Googlebot rules, not this UA.

GoogleOther

Generic Google AI fetcher used for product research and feature development.

Perplexity

PerplexityBot

Perplexity's index crawler. Blocks remove your site from Perplexity entirely.

Perplexity-User

Fetches a URL on behalf of a Perplexity user during a query.

Apple

Applebot-Extended

Apple Intelligence training opt-out. Distinct from Applebot (search indexing).

Meta

meta-externalagent

Meta's AI training crawler (Llama models).

Common Crawl

CCBot

Common Crawl corpus — most LLMs are trained on this dataset.

ByteDance / TikTok

Bytespider

ByteDance's AI training crawler (Doubao, TikTok AI features).

Diffbot

Diffbot

Diffbot's structured-data extraction crawler.

Standard search engines

Most sites should allow these. Bingbot also feeds ChatGPT grounding; Googlebot governs live Gemini access.

Google

Googlebot

Google Search. Also governs live Gemini grounding (Gemini fetches via Googlebot rules + JS-rendered HTML).

Microsoft

Bingbot

Bing Search. Also feeds ChatGPT and Copilot live grounding.

DuckDuckGo

DuckDuckBot

DuckDuckGo's search crawler.

Yandex

YandexBot

Yandex's search crawler.

All other crawlers (User-agent: *)

Catch-all rule for every bot not listed explicitly above.

Sitemap URL (optional)

Declared at the bottom of robots.txt. AI crawlers and search engines use this for discovery.

Preview
# Generated by Citare — citare.ai/tools/robots-txt-generator

Training vs grounding — the distinction that decides this

Each AI provider has at least two crawlers — one for training (GPTBot, Google-Extended, Applebot-Extended, CCBot, etc.) and one for live grounding (OAI-SearchBot, ChatGPT-User, Claude-User, Perplexity-User, GoogleOther). Blocking training crawlers opts you out of model training; blocking grounding crawlers makes your site invisible in AI-search citations.

The most common mistake is blocking bothwith a wildcard rule and accidentally disappearing from ChatGPT, Claude, and Perplexity grounded answers. The “Block training, allow grounding” preset above gets the split right by default.

Frequently asked

How do I create a robots.txt file for AI crawlers?

Use the form above. Citare's Robots.txt Generator lets you set an explicit allow / block / skip decision for every major AI crawler — GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, GoogleOther, Applebot-Extended, meta-externalagent, CCBot, Bytespider, Diffbot — plus standard search-engine bots (Googlebot, Bingbot, DuckDuckBot, YandexBot). The tool renders a paste-ready robots.txt as you click. Save it as /robots.txt at the root of your domain (e.g. https://example.com/robots.txt) — that's the canonical location every crawler checks.

What's the right robots.txt setup for AI search visibility?

For most sites the optimal default is the "Block AI training, allow grounding" preset: block GPTBot, Google-Extended, Applebot-Extended, meta-externalagent, CCBot, Bytespider (training-only crawlers); allow OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, GoogleOther (grounding crawlers); allow Googlebot and Bingbot (search index — also feeds live Gemini and ChatGPT grounding respectively). This setup keeps you cited in AI answers without contributing to model training corpora. Override per-bot if you have specific copyright, paywall, or compliance constraints.

Should I block GPTBot and ClaudeBot, or allow them?

It depends on your training-vs-grounding stance. Blocking GPTBot and ClaudeBot stops OpenAI and Anthropic from training their models on your content but does not stop ChatGPT or Claude from citing your site in grounded answers — those use OAI-SearchBot/ChatGPT-User and Claude-User respectively. Block training crawlers if you want to opt out of LLM training (paywalled content, copyrighted material, internal docs); allow them if you're comfortable contributing to training. The grounding crawlers should almost always be allowed — they directly drive citation visibility in ChatGPT, Claude, and Perplexity answers.

What does the wildcard User-agent: * rule do?

User-agent: * applies to every crawler that doesn't have its own explicit rule group. If you set the wildcard to Allow, any unlisted bot is permitted (including new AI crawlers that launch after you publish your robots.txt). If you set it to Disallow, every unlisted bot is blocked — useful for sites that want a strict allowlist where only the bots you explicitly Allow can crawl. Citare's generator defaults to Skip (no wildcard rule), which means the spec-default behavior applies: any bot without a rule is allowed.

Where do I put the robots.txt file once it's generated?

Save the file as robots.txt at the root of your primary domain — the URL must be exactly https://yourdomain.com/robots.txt. Subpath locations (e.g. /content/robots.txt) are not picked up by crawlers. If you have multiple subdomains (blog.example.com, app.example.com), each one needs its own robots.txt at its root. After publishing, validate with the Citare AI Robots.txt Checker (paste your URL and confirm the per-bot allow/block status matches what you intended) before relying on the rules for AI visibility decisions.

More free GEO tools

robots.txt access is one of four diagnostic axes Citare Studio measures monthly. See Studio →