Citare Tools · Free
Robots.txt Generator.
Build a paste-ready robots.txtwith per-bot allow/block decisions for GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and 10+ other AI crawlers. Three opinionated presets if you don’t want to think about it.
Free. No signup. Pure client-side. Live preview, copy or download.
AI bots
Per-bot allow / block / skip. Skip = no rule for that bot (default behavior is allow).
OpenAI
GPTBotCrawls public content for GPT model training. Blocking opts you out of training only.
OAI-SearchBotIndexes content for ChatGPT's live web search. Block this and you disappear from ChatGPT citations.
ChatGPT-UserFetches a URL when a ChatGPT user clicks a link or a custom GPT browses the web.
Anthropic
ClaudeBotAnthropic's training crawler for Claude.
Claude-UserFetches a URL on behalf of a Claude user during a conversation.
Google AI
Google-ExtendedControls Gemini and Vertex AI training opt-in. Live Gemini grounding uses Googlebot rules, not this UA.
GoogleOtherGeneric Google AI fetcher used for product research and feature development.
Perplexity
PerplexityBotPerplexity's index crawler. Blocks remove your site from Perplexity entirely.
Perplexity-UserFetches a URL on behalf of a Perplexity user during a query.
Apple
Applebot-ExtendedApple Intelligence training opt-out. Distinct from Applebot (search indexing).
Meta
meta-externalagentMeta's AI training crawler (Llama models).
Common Crawl
CCBotCommon Crawl corpus — most LLMs are trained on this dataset.
ByteDance / TikTok
BytespiderByteDance's AI training crawler (Doubao, TikTok AI features).
Diffbot
DiffbotDiffbot's structured-data extraction crawler.
Standard search engines
Most sites should allow these. Bingbot also feeds ChatGPT grounding; Googlebot governs live Gemini access.
GooglebotGoogle Search. Also governs live Gemini grounding (Gemini fetches via Googlebot rules + JS-rendered HTML).
Microsoft
BingbotBing Search. Also feeds ChatGPT and Copilot live grounding.
DuckDuckGo
DuckDuckBotDuckDuckGo's search crawler.
Yandex
YandexBotYandex's search crawler.
All other crawlers (User-agent: *)
Catch-all rule for every bot not listed explicitly above.
Sitemap URL (optional)
Declared at the bottom of robots.txt. AI crawlers and search engines use this for discovery.
# Generated by Citare — citare.ai/tools/robots-txt-generator
Training vs grounding — the distinction that decides this
Each AI provider has at least two crawlers — one for training (GPTBot, Google-Extended, Applebot-Extended, CCBot, etc.) and one for live grounding (OAI-SearchBot, ChatGPT-User, Claude-User, Perplexity-User, GoogleOther). Blocking training crawlers opts you out of model training; blocking grounding crawlers makes your site invisible in AI-search citations.
The most common mistake is blocking bothwith a wildcard rule and accidentally disappearing from ChatGPT, Claude, and Perplexity grounded answers. The “Block training, allow grounding” preset above gets the split right by default.
Frequently asked
How do I create a robots.txt file for AI crawlers?
Use the form above. Citare's Robots.txt Generator lets you set an explicit allow / block / skip decision for every major AI crawler — GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, GoogleOther, Applebot-Extended, meta-externalagent, CCBot, Bytespider, Diffbot — plus standard search-engine bots (Googlebot, Bingbot, DuckDuckBot, YandexBot). The tool renders a paste-ready robots.txt as you click. Save it as /robots.txt at the root of your domain (e.g. https://example.com/robots.txt) — that's the canonical location every crawler checks.
What's the right robots.txt setup for AI search visibility?
For most sites the optimal default is the "Block AI training, allow grounding" preset: block GPTBot, Google-Extended, Applebot-Extended, meta-externalagent, CCBot, Bytespider (training-only crawlers); allow OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, GoogleOther (grounding crawlers); allow Googlebot and Bingbot (search index — also feeds live Gemini and ChatGPT grounding respectively). This setup keeps you cited in AI answers without contributing to model training corpora. Override per-bot if you have specific copyright, paywall, or compliance constraints.
Should I block GPTBot and ClaudeBot, or allow them?
It depends on your training-vs-grounding stance. Blocking GPTBot and ClaudeBot stops OpenAI and Anthropic from training their models on your content but does not stop ChatGPT or Claude from citing your site in grounded answers — those use OAI-SearchBot/ChatGPT-User and Claude-User respectively. Block training crawlers if you want to opt out of LLM training (paywalled content, copyrighted material, internal docs); allow them if you're comfortable contributing to training. The grounding crawlers should almost always be allowed — they directly drive citation visibility in ChatGPT, Claude, and Perplexity answers.
What does the wildcard User-agent: * rule do?
User-agent: * applies to every crawler that doesn't have its own explicit rule group. If you set the wildcard to Allow, any unlisted bot is permitted (including new AI crawlers that launch after you publish your robots.txt). If you set it to Disallow, every unlisted bot is blocked — useful for sites that want a strict allowlist where only the bots you explicitly Allow can crawl. Citare's generator defaults to Skip (no wildcard rule), which means the spec-default behavior applies: any bot without a rule is allowed.
Where do I put the robots.txt file once it's generated?
Save the file as robots.txt at the root of your primary domain — the URL must be exactly https://yourdomain.com/robots.txt. Subpath locations (e.g. /content/robots.txt) are not picked up by crawlers. If you have multiple subdomains (blog.example.com, app.example.com), each one needs its own robots.txt at its root. After publishing, validate with the Citare AI Robots.txt Checker (paste your URL and confirm the per-bot allow/block status matches what you intended) before relying on the rules for AI visibility decisions.
More free GEO tools
robots.txt access is one of four diagnostic axes Citare Studio measures monthly. See Studio →