Citare Tools · AI Crawler Tracking
See exactly which AI bots visit your site.
Every fetch by GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, Google-CloudVertexBot, and ~16 other AI crawlers — verified against each provider’s published IP ranges. Defendable in writing.
Included for every Citare customer. Available piecemeal on request.
Why this matters
The four-index reality
Each AI engine’s web-search tool grounds against a differentunderlying search index. ChatGPT uses Bing. Claude uses Brave Search. Gemini and Google AI Overviews use Google’s live index. Perplexity uses its own proprietary index.
Optimising for one doesn’t automatically lift on the others — a brand top-3 on Google may be entirely absent from Brave’s smaller crawl, which means Claude won’t surface it. Most agencies treat AI search as a black box: “we hope ChatGPT mentions you.”
Citare measures the box directly. Every fetch by an AI bot, verified against the provider’s published IP ranges. When you present the monthly report, every count is defendable in writing.
How it works
Foolproof verification
Three layers run on every hit. Anything that fails all three lands as unverified — surfaced in a side stat, never folded into the headline.
UA pattern match
A canonical 25-bot allowlist matches the User-Agent string. Cheap, but spoofable — so this layer alone never sets verified=true.
IP-range CIDR check
The source IP is matched against the bot provider's published JSON of IP ranges (OpenAI, Google, Perplexity, Apple, Microsoft). Match → verified, full IP retained.
Reverse-DNS confirmation
For providers that don't publish IP ranges (Bingbot, Applebot), reverse-DNS the IP, match the hostname suffix, then forward-resolve back to the original IP. All three must pass.
Headline metric, locked:
WHERE ip_verified=true AND bot_class IN ('training', 'live_search')
Indexing crawlers (Googlebot, Bingbot) and unverified hits ride alongside in side stats. Every count is reproducible from raw rows. If a CFO asks “how do you know that’s real?”, we can show them the row.
Works for every customer
Three install paths
Pick whichever fits your hosting. One install per property is sufficient — our fingerprint dedup collapses duplicates across paths automatically. Zero code changes; configuration only.
Vercel Log Drains
SaaS · Next.js · Vercel-hosted
- Open project Settings → Log Drains
- Add JSON destination → paste headers
- Save — verification probe is automatic
Cloudflare Logpush
Anyone behind Cloudflare (~40% of sites)
- Analytics & Logs → Create Logpush job
- HTTP destination → paste headers
- Save — Cloudflare auth-check is automatic
WordPress mu-plugin
Hostinger · cPanel · managed WP
- Download zip from Settings → AI Crawlers
- Upload to /wp-content/mu-plugins/
- Paste API key in Settings → Citare Crawler
For self-hosted Nginx/Apache, our generic NDJSON format works with any forwarder (Vector, Logtail, Fluentd) that can POST to a URL.
What we don’t hide
Honest about the limits
Anthropic doesn’t publish IP ranges yet
Claude visits land as ‘unverified by design’. Their crawlers come from Google Cloud shared infrastructure — verifying off that would false-positive any GCP-hosted bot. We say so on every row in the dashboard, separately from the headline number. Re-evaluate quarterly.
Gemini uses a stealth Chrome UA for grounding
Path A's bot-pattern matcher silently drops it. Path C log forwarding catches it via IP-range check against Google's user-triggered-fetchers JSON regardless of UA. If you want complete Gemini coverage, install Path C.
Cache plugins can short-circuit PHP
W3 Total Cache, LiteSpeed, WP Rocket can serve cached pages before PHP runs. Our WordPress plugin must go in /wp-content/mu-plugins/ for max coverage. If you have host-level log forwarding (Vercel / Cloudflare), prefer that — it's strictly more complete.
Three views
What you see in the dashboard
- Overview — verified-headline number, by-class stack-bar, daily sparkline, recent-activity table, status-code anomaly card when bots see 4xx/5xx errors.
- By bot — table of every bot that visited: provider, hit count, verified count, trend vs prior period, last-seen.
- By path— most-crawled URLs in the period, with “first crawl” markers when a bot discovered a new page for the first time.
Drill-down drawer on every row shows the verification method, source IP, full UA string, ingestion path, and an explanation of what the bot does.
It writes itself
In your monthly client report
When the period has crawler activity, the monthly client report adds a new AI Crawler Activity section. The kind of line an agency owner forwards directly to their client:
“GPTBot, ChatGPT-User, OAI-SearchBot, and PerplexityBot fetched 47 pages this period — verified against each provider’s published IP ranges, up from 12 last period. Perplexity discovered /pricingfor the first time on April 12 — that page is now in their grounding index. Claude visited the site for the first time this period; previously they had no record of the brand. Verified ratio: 94%, up from 88% — spoof attempts are minimal.”Section omitted entirely when no activity to narrate. Opus skips silently rather than writing “no data” boilerplate.
Get started
Already a Citare customer?
It’s already in your dashboard.
AI crawler tracking is included in every Citare plan. Add a property in Settings → AI Crawlers and paste the install snippet for your hosting.
Open dashboard →Want this without the full Citare suite?
Tell us about your use case.
We offer AI Crawler Tracking piecemeal for select use cases. Drop a few sentences below and we’ll be in touch within 24 hours.