ClaudeBot
ClaudeBot is Anthropic's web crawler that collects public content for Claude model training, identifiable by the User-Agent string 'ClaudeBot/1.0' and operating from Google Cloud Platform infrastructure with no published static IP range.
Definition
ClaudeBot is Anthropic's primary web crawler, used to collect public web content for training and improving Claude models. It identifies itself with the User-Agent string Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com). Unlike GPTBot or PerplexityBot, ClaudeBot does not publish a static IP range — Anthropic runs crawl infrastructure on Google Cloud Platform, and verification has to fall back to User-Agent + reverse DNS rather than IP allowlists.
Why it matters
ClaudeBot access shapes whether your content enters Claude's training corpus, which influences baseline brand knowledge across every Claude version. ClaudeBot is distinct from the Brave search index that Claude uses for real-time grounding when the user enables web search — blocking ClaudeBot removes your content from training but does not directly affect live citation visibility. Both layers matter for the four-index reality because Claude's two surfaces (parametric knowledge + Brave grounding) have different content acquisition paths.
Crawl behavior
- Respects robots.txt
- Honors
Crawl-delaydirective - 429 + Retry-After backoff
- Refresh cadence: monthly typical, slower for low-update sites
- No JavaScript execution — server-side rendering required for content visibility
- No image OCR — alt text captured, image content not
- No login-walled or paywalled content
Verification challenge
The no-published-IPs reality means ClaudeBot traffic appears in origin logs from generic GCP IP space. Standard verification:
grep -i "claudebot" /var/log/nginx/access.log | awk '{print $1}' | sort -u | head
Reverse-DNS the IPs — confirmed ClaudeBot requests resolve to googleusercontent.com or googlebot.com infrastructure. Origin requests claiming the ClaudeBot User-Agent from non-GCP IPs are unverified and usually scrapers impersonating the bot.
How to allow it
In robots.txt:
User-Agent: ClaudeBot
Allow: /
In Cloudflare: Security → Bots → AI Crawl Control → "Do not block (allow crawlers)". Cloudflare's default AI Bot Block list includes ClaudeBot, so a CF default deployment blocks it unless explicitly allowed.
Common confusions
- ClaudeBot ≠ Claude-Web. Claude-Web (User-Agent:
Claude-Web/1.0) is the bot that fetches pages when a Claude user pastes a URL into chat. User-initiated, not a crawler. - ClaudeBot ≠ Brave's crawler. Brave runs its own crawler that builds the Brave Search index Claude grounds on. Allowing ClaudeBot does not put you in Brave's index — that's a separate Bravebot allow.
- Blocking ClaudeBot does not block real-time Claude citations. Those come through Brave. To improve Claude grounding visibility, focus on Brave Search index inclusion.
See /ai-bot-crawlers for the full bot reference table.
Frequently asked
Why doesn't Anthropic publish a static IP range for ClaudeBot?
Anthropic runs its crawler on Google Cloud Platform infrastructure, where outbound IPs are dynamic across the GCP allocation pool. Anthropic's published verification guidance falls back to User-Agent + reverse DNS to GCP/googleusercontent.com. This is correct behavior — IP allowlists are not feasible for GCP-hosted crawlers.
Does blocking ClaudeBot remove my brand from Claude's responses?
Only partially. Blocking ClaudeBot removes your content from future Claude training corpora, which over time reduces the parametric knowledge Claude has about your brand. It does not affect real-time citations when a Claude user enables web search — those come through Brave's separate search index.
How do I tell if ClaudeBot is being blocked by Cloudflare?
Check Security → Bots → AI Crawl Control in the Cloudflare dashboard. Cloudflare's default AI Bot Block list includes ClaudeBot, so a default deployment silently blocks it even if robots.txt allows. Grep your origin logs for ClaudeBot User-Agent over 30 days — zero hits with allowed robots.txt indicates upstream blocking.
Does ClaudeBot render JavaScript?
No. ClaudeBot fetches server-side HTML only and does not execute client-side scripts. Content rendered only by client-side JavaScript after page load is invisible to ClaudeBot. Server-side rendering or pre-rendering of priority content is required.
Related
Stop guessing where you rank in AI search
Citare measures citation rate and share of voice across ChatGPT, Google AI Overview, Gemini, Claude, and Perplexity — weekly, for your priority queries. Free forever tier.