PerplexityBot
PerplexityBot is Perplexity AI's web crawler that uniquely feeds both training data and Perplexity's live retrieval index in the same crawl event — a dual-horizon bot whose single visit produces both immediate citation visibility and long-term ranking compound.
Definition
PerplexityBot is Perplexity AI's web crawler, identifying itself as Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot) with published IP ranges at perplexity.ai/perplexitybot. Unlike GPTBot (training-only) or Bingbot (indexing-only), PerplexityBot's single crawl event feeds two distinct downstream uses.
Why it matters
PerplexityBot is the highest-leverage AI crawler for a single allow rule. One crawl produces:
- Live retrieval index entry — the crawled page becomes citeable in real-time Perplexity answers
- Training data signal — the crawled content shapes future Perplexity model fine-tuning
Combined, this dual-horizon property makes PerplexityBot access more valuable per crawl than any other bot in the AI ecosystem.
Crawl behavior
- Respects robots.txt
- Honors
Crawl-delaydirective - 429 + Retry-After backoff
- Refresh cadence: weekly to monthly for established sites, monthly initial for newer sites
- Moderate parallelism (not aggressive enough to consume meaningful bandwidth on most sites)
- Limited JavaScript render budget — JS-only critical content frequently fails to enter the index
- No image OCR — image alt text captured, image content not
What PerplexityBot stores
- Rendered HTML content (post server-side rendering)
- JSON-LD structured data
- Internal + outbound link graph
- Last-modified timestamps (from sitemap.xml lastmod and JSON-LD dateModified)
What it skips
- JavaScript-only critical content (limited render budget)
- Image-locked claims (no OCR)
- Pages blocked in robots.txt
- Soft-404s and pages with 5xx errors
- Pages requiring authentication
Common indexing failures
The most common cause of weak Perplexity citation isn't content quality — it's upstream blocking. Cloudflare Bot Fight Mode and aggressive WAF rules return CAPTCHA challenges to PerplexityBot, which fail because the bot can't solve them, which leaves zero crawl traffic in origin logs even though robots.txt allows the bot. Check both layers.
See /guides/how-perplexity-indexes-websites for the technical deep dive.
Frequently asked
How often does PerplexityBot crawl an established site?
Weekly to monthly. Newer sites or sites with sporadic publishing cadence may see monthly initial crawl, transitioning to more frequent recrawl after consistent content updates. Submitting an updated sitemap.xml with current lastmod values accelerates recrawl.
Why do I see zero PerplexityBot hits in my logs even though robots.txt allows it?
Most commonly upstream blocking. Cloudflare's Bot Fight Mode, Super Bot Fight, or WAF rules block AI bots before they reach your origin. AI bots fail the JavaScript challenge or CAPTCHA, so the crawl never lands. Check your CDN's bot management settings — blocking can happen at the edge level independent of robots.txt.
Does PerplexityBot honor noindex meta tags?
Yes. Standard meta robots directives (noindex, nofollow) are respected. Pages with noindex will not appear in Perplexity citations and will fall out of the index over weeks if previously crawled.
Related
Stop guessing where you rank in AI search
Citare measures citation rate and share of voice across ChatGPT, Google AI Overview, Gemini, Claude, and Perplexity — weekly, for your priority queries. Free forever tier.