Citare Tools · AI Crawler Tracking

See exactly which AI bots visit your site.

Every fetch by GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, Google-CloudVertexBot, and ~16 other AI crawlers — verified against each provider’s published IP ranges. Defendable in writing.

Open my dashboard Just want this one? Contact us

Included for every Citare customer. Available piecemeal on request.

Why this matters

The four-index reality

Each AI engine’s web-search tool grounds against a differentunderlying search index. ChatGPT uses Bing. Claude uses Brave Search. Gemini and Google AI Overviews use Google’s live index. Perplexity uses its own proprietary index.

Optimising for one doesn’t automatically lift on the others — a brand top-3 on Google may be entirely absent from Brave’s smaller crawl, which means Claude won’t surface it. Most agencies treat AI search as a black box: “we hope ChatGPT mentions you.”

Citare measures the box directly. Every fetch by an AI bot, verified against the provider’s published IP ranges. When you present the monthly report, every count is defendable in writing.

How it works

Foolproof verification

Three layers run on every hit. Anything that fails all three lands as unverified — surfaced in a side stat, never folded into the headline.

Layer 1

UA pattern match

A canonical 25-bot allowlist matches the User-Agent string. Cheap, but spoofable — so this layer alone never sets verified=true.

Layer 2

IP-range CIDR check

The source IP is matched against the bot provider's published JSON of IP ranges (OpenAI, Google, Perplexity, Apple, Microsoft). Match → verified, full IP retained.

Layer 3

Reverse-DNS confirmation

For providers that don't publish IP ranges (Bingbot, Applebot), reverse-DNS the IP, match the hostname suffix, then forward-resolve back to the original IP. All three must pass.

Headline metric, locked:

WHERE ip_verified=true AND bot_class IN ('training', 'live_search')

Indexing crawlers (Googlebot, Bingbot) and unverified hits ride alongside in side stats. Every count is reproducible from raw rows. If a CFO asks “how do you know that’s real?”, we can show them the row.

Works for every customer

Three install paths

Pick whichever fits your hosting. One install per property is sufficient — our fingerprint dedup collapses duplicates across paths automatically. Zero code changes; configuration only.

Vercel Log Drains

SaaS · Next.js · Vercel-hosted

Open project Settings → Log Drains
Add JSON destination → paste headers
Save — verification probe is automatic

Cloudflare Logpush

Anyone behind Cloudflare (~40% of sites)

Analytics & Logs → Create Logpush job
HTTP destination → paste headers
Save — Cloudflare auth-check is automatic

WordPress mu-plugin

Hostinger · cPanel · managed WP

Download zip from Settings → AI Crawlers
Upload to /wp-content/mu-plugins/
Paste API key in Settings → Citare Crawler

For self-hosted Nginx/Apache, our generic NDJSON format works with any forwarder (Vector, Logtail, Fluentd) that can POST to a URL.

What we don’t hide

Honest about the limits

Anthropic doesn’t publish IP ranges yet

Claude visits land as ‘unverified by design’. Their crawlers come from Google Cloud shared infrastructure — verifying off that would false-positive any GCP-hosted bot. We say so on every row in the dashboard, separately from the headline number. Re-evaluate quarterly.

Gemini uses a stealth Chrome UA for grounding

Path A's bot-pattern matcher silently drops it. Path C log forwarding catches it via IP-range check against Google's user-triggered-fetchers JSON regardless of UA. If you want complete Gemini coverage, install Path C.

Cache plugins can short-circuit PHP

W3 Total Cache, LiteSpeed, WP Rocket can serve cached pages before PHP runs. Our WordPress plugin must go in /wp-content/mu-plugins/ for max coverage. If you have host-level log forwarding (Vercel / Cloudflare), prefer that — it's strictly more complete.

Three views

What you see in the dashboard

Overview — verified-headline number, by-class stack-bar, daily sparkline, recent-activity table, status-code anomaly card when bots see 4xx/5xx errors.
By bot — table of every bot that visited: provider, hit count, verified count, trend vs prior period, last-seen.
By path— most-crawled URLs in the period, with “first crawl” markers when a bot discovered a new page for the first time.

Drill-down drawer on every row shows the verification method, source IP, full UA string, ingestion path, and an explanation of what the bot does.

It writes itself

In your monthly client report

When the period has crawler activity, the monthly client report adds a new AI Crawler Activity section. The kind of line an agency owner forwards directly to their client:

“GPTBot, ChatGPT-User, OAI-SearchBot, and PerplexityBot fetched 47 pages this period — verified against each provider’s published IP ranges, up from 12 last period. Perplexity discovered /pricingfor the first time on April 12 — that page is now in their grounding index. Claude visited the site for the first time this period; previously they had no record of the brand. Verified ratio: 94%, up from 88% — spoof attempts are minimal.”

Section omitted entirely when no activity to narrate. Opus skips silently rather than writing “no data” boilerplate.

Get started

Already a Citare customer?

It’s already in your dashboard.

AI crawler tracking is included in every Citare plan. Add a property in Settings → AI Crawlers and paste the install snippet for your hosting.

Open dashboard →

Want this without the full Citare suite?

Tell us about your use case.

We offer AI Crawler Tracking piecemeal for select use cases. Drop a few sentences below and we’ll be in touch within 24 hours.