citare
AI bots + crawlers

PerplexityBot

PerplexityBot is Perplexity AI's web crawler that uniquely feeds both training data and Perplexity's live retrieval index in the same crawl event — a dual-horizon bot whose single visit produces both immediate citation visibility and long-term ranking compound.

Definition

PerplexityBot is Perplexity AI's web crawler, identifying itself as Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot) with published IP ranges at perplexity.ai/perplexitybot. Unlike GPTBot (training-only) or Bingbot (indexing-only), PerplexityBot's single crawl event feeds two distinct downstream uses.

Why it matters

PerplexityBot is the highest-leverage AI crawler for a single allow rule. One crawl produces:

  1. Live retrieval index entry — the crawled page becomes citeable in real-time Perplexity answers
  2. Training data signal — the crawled content shapes future Perplexity model fine-tuning

Combined, this dual-horizon property makes PerplexityBot access more valuable per crawl than any other bot in the AI ecosystem.

Crawl behavior

  • Respects robots.txt
  • Honors Crawl-delay directive
  • 429 + Retry-After backoff
  • Refresh cadence: weekly to monthly for established sites, monthly initial for newer sites
  • Moderate parallelism (not aggressive enough to consume meaningful bandwidth on most sites)
  • Limited JavaScript render budget — JS-only critical content frequently fails to enter the index
  • No image OCR — image alt text captured, image content not

What PerplexityBot stores

  • Rendered HTML content (post server-side rendering)
  • JSON-LD structured data
  • Internal + outbound link graph
  • Last-modified timestamps (from sitemap.xml lastmod and JSON-LD dateModified)

What it skips

  • JavaScript-only critical content (limited render budget)
  • Image-locked claims (no OCR)
  • Pages blocked in robots.txt
  • Soft-404s and pages with 5xx errors
  • Pages requiring authentication

Common indexing failures

The most common cause of weak Perplexity citation isn't content quality — it's upstream blocking. Cloudflare Bot Fight Mode and aggressive WAF rules return CAPTCHA challenges to PerplexityBot, which fail because the bot can't solve them, which leaves zero crawl traffic in origin logs even though robots.txt allows the bot. Check both layers.

See /guides/how-perplexity-indexes-websites for the technical deep dive.

Frequently asked

How often does PerplexityBot crawl an established site?

Weekly to monthly. Newer sites or sites with sporadic publishing cadence may see monthly initial crawl, transitioning to more frequent recrawl after consistent content updates. Submitting an updated sitemap.xml with current lastmod values accelerates recrawl.

Why do I see zero PerplexityBot hits in my logs even though robots.txt allows it?

Most commonly upstream blocking. Cloudflare's Bot Fight Mode, Super Bot Fight, or WAF rules block AI bots before they reach your origin. AI bots fail the JavaScript challenge or CAPTCHA, so the crawl never lands. Check your CDN's bot management settings — blocking can happen at the edge level independent of robots.txt.

Does PerplexityBot honor noindex meta tags?

Yes. Standard meta robots directives (noindex, nofollow) are respected. Pages with noindex will not appear in Perplexity citations and will fall out of the index over weeks if previously crawled.

Related

Stop guessing where you rank in AI search

Citare measures citation rate and share of voice across ChatGPT, Google AI Overview, Gemini, Claude, and Perplexity — weekly, for your priority queries. Free forever tier.