PerplexityBot

Definition

PerplexityBot is Perplexity AI's web crawler, identifying itself as Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot) with published IP ranges at perplexity.ai/perplexitybot. Unlike GPTBot (training-only) or Bingbot (indexing-only), PerplexityBot's single crawl event feeds two distinct downstream uses.

Why it matters

PerplexityBot is the highest-leverage AI crawler for a single allow rule. One crawl produces:

Live retrieval index entry — the crawled page becomes citeable in real-time Perplexity answers
Training data signal — the crawled content shapes future Perplexity model fine-tuning

Combined, this dual-horizon property makes PerplexityBot access more valuable per crawl than any other bot in the AI ecosystem.

Crawl behavior

Respects robots.txt
Honors Crawl-delay directive
429 + Retry-After backoff
Refresh cadence: weekly to monthly for established sites, monthly initial for newer sites
Moderate parallelism (not aggressive enough to consume meaningful bandwidth on most sites)
Limited JavaScript render budget — JS-only critical content frequently fails to enter the index
No image OCR — image alt text captured, image content not

What PerplexityBot stores

Rendered HTML content (post server-side rendering)
JSON-LD structured data
Internal + outbound link graph
Last-modified timestamps (from sitemap.xml lastmod and JSON-LD dateModified)

What it skips

JavaScript-only critical content (limited render budget)
Image-locked claims (no OCR)
Pages blocked in robots.txt
Soft-404s and pages with 5xx errors
Pages requiring authentication

Common indexing failures

The most common cause of weak Perplexity citation isn't content quality — it's upstream blocking. Cloudflare Bot Fight Mode and aggressive WAF rules return CAPTCHA challenges to PerplexityBot, which fail because the bot can't solve them, which leaves zero crawl traffic in origin logs even though robots.txt allows the bot. Check both layers.

See /guides/how-perplexity-indexes-websites for the technical deep dive.

Frequently asked

How often does PerplexityBot crawl an established site?

Weekly to monthly. Newer sites or sites with sporadic publishing cadence may see monthly initial crawl, transitioning to more frequent recrawl after consistent content updates. Submitting an updated sitemap.xml with current lastmod values accelerates recrawl.

Why do I see zero PerplexityBot hits in my logs even though robots.txt allows it?

Most commonly upstream blocking. Cloudflare's Bot Fight Mode, Super Bot Fight, or WAF rules block AI bots before they reach your origin. AI bots fail the JavaScript challenge or CAPTCHA, so the crawl never lands. Check your CDN's bot management settings — blocking can happen at the edge level independent of robots.txt.

Does PerplexityBot honor noindex meta tags?

Yes. Standard meta robots directives (noindex, nofollow) are respected. Pages with noindex will not appear in Perplexity citations and will fall out of the index over weeks if previously crawled.