Question 1

How do I create a robots.txt file for AI crawlers?

Accepted Answer

Use the form above. Citare's Robots.txt Generator lets you set an explicit allow / block / skip decision for every major AI crawler — GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, GoogleOther, Applebot-Extended, meta-externalagent, CCBot, Bytespider, Diffbot — plus standard search-engine bots (Googlebot, Bingbot, DuckDuckBot, YandexBot). The tool renders a paste-ready robots.txt as you click. Save it as /robots.txt at the root of your domain (e.g. https://example.com/robots.txt) — that's the canonical location every crawler checks.

Question 2

What's the right robots.txt setup for AI search visibility?

Accepted Answer

For most sites the optimal default is the "Block AI training, allow grounding" preset: block GPTBot, Google-Extended, Applebot-Extended, meta-externalagent, CCBot, Bytespider (training-only crawlers); allow OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, GoogleOther (grounding crawlers); allow Googlebot and Bingbot (search index — also feeds live Gemini and ChatGPT grounding respectively). This setup keeps you cited in AI answers without contributing to model training corpora. Override per-bot if you have specific copyright, paywall, or compliance constraints.

Question 3

Should I block GPTBot and ClaudeBot, or allow them?

Accepted Answer

It depends on your training-vs-grounding stance. Blocking GPTBot and ClaudeBot stops OpenAI and Anthropic from training their models on your content but does not stop ChatGPT or Claude from citing your site in grounded answers — those use OAI-SearchBot/ChatGPT-User and Claude-User respectively. Block training crawlers if you want to opt out of LLM training (paywalled content, copyrighted material, internal docs); allow them if you're comfortable contributing to training. The grounding crawlers should almost always be allowed — they directly drive citation visibility in ChatGPT, Claude, and Perplexity answers.

Question 4

What does the wildcard User-agent: * rule do?

Accepted Answer

User-agent: * applies to every crawler that doesn't have its own explicit rule group. If you set the wildcard to Allow, any unlisted bot is permitted (including new AI crawlers that launch after you publish your robots.txt). If you set it to Disallow, every unlisted bot is blocked — useful for sites that want a strict allowlist where only the bots you explicitly Allow can crawl. Citare's generator defaults to Skip (no wildcard rule), which means the spec-default behavior applies: any bot without a rule is allowed.

Question 5

Where do I put the robots.txt file once it's generated?

Accepted Answer

Save the file as robots.txt at the root of your primary domain — the URL must be exactly https://yourdomain.com/robots.txt. Subpath locations (e.g. /content/robots.txt) are not picked up by crawlers. If you have multiple subdomains (blog.example.com, app.example.com), each one needs its own robots.txt at its root. After publishing, validate with the Citare AI Robots.txt Checker (paste your URL and confirm the per-bot allow/block status matches what you intended) before relying on the rules for AI visibility decisions.

Robots.txt Generator.

AI bots

Standard search engines

All other crawlers (User-agent: *)

Sitemap URL (optional)

Training vs grounding — the distinction that decides this

Frequently asked

How do I create a robots.txt file for AI crawlers?

What's the right robots.txt setup for AI search visibility?

Should I block GPTBot and ClaudeBot, or allow them?

What does the wildcard User-agent: * rule do?

Where do I put the robots.txt file once it's generated?

More free GEO tools