Google-Extended
Google-Extended is a robots.txt directive that lets site owners opt out of Google's generative-AI training (Gemini + Vertex AI) without affecting Google Search ranking or AI Overview citation — a control token, not a crawler with its own User-Agent.
Definition
Google-Extended is not a separate web crawler — it's a robots.txt control token Google introduced in 2023 that lets site owners opt their content out of generative-AI training corpora (Gemini, Vertex AI Search) while keeping it available for Google Search ranking. There is no Google-Extended User-Agent in HTTP request logs; the control operates as a policy gate Googlebot applies internally when it decides whether crawled content can flow into training pipelines.
Why it matters
Google's content acquisition is single-pipeline. The same Googlebot crawl that builds the Google Search index also feeds Gemini training and AI Overview composition. Without Google-Extended, opting out of AI training meant blocking Googlebot entirely — losing classic SEO traffic along with it. Google-Extended decouples the two: keep Search ranking, opt out of generative-AI training. Per Google's 2026-05-15 AI Optimization Guide, Google-Extended specifically governs training, not AIO ranking — AIO continues to source from the Google Search index regardless of the Google-Extended setting.
How to use it
In robots.txt:
User-Agent: Google-Extended
Disallow: /
To allow AI training (the default if no directive present):
User-Agent: Google-Extended
Allow: /
Path-level scoping works the same as Googlebot directives — disallow specific subtrees, allow the rest.
What it does and does not affect
- Affects: content inclusion in Gemini fine-tuning, Vertex AI Search retrieval indexes, Bard/Gemini training corpus
- Does not affect: Google Search blue-link ranking, AI Overview citation eligibility, Google Discover, Knowledge Graph population
This separation is the entire point of the control. A site that blocks Google-Extended remains fully ranked in classic Search and AIO; it's only excluded from the long-term training corpus.
Privacy + content licensing implications
For publishers, Google-Extended is the practical lever for separating SEO economic value (Search traffic) from AI-training contribution (zero direct return). News publishers, paywalled content sites, and creator-owned platforms increasingly disallow Google-Extended while allowing Googlebot — declaring the asymmetry that crawls for training should not be free when crawls for search drive traffic that monetizes.
For most marketing sites, allowing Google-Extended is the default choice. Pre-training inclusion influences baseline Gemini knowledge of your brand, which compounds across model versions.
Common pitfalls
- Confusing Google-Extended with GPTBot. GPTBot is a real crawler with its own User-Agent and IP range. Google-Extended is a robots.txt policy token only. Blocking GPTBot does not affect Google AI; blocking Google-Extended does not affect OpenAI.
- Path-mismatch with Googlebot. If Googlebot can't reach a path, Google-Extended access to that path is moot — there's nothing to gate.
- No User-Agent in logs. Don't grep server logs for
Google-Extended. There is no such request signature. Verification is robots.txt-only.
See /ai-bot-crawlers for the full control matrix.
Frequently asked
Will blocking Google-Extended hurt my Google Search ranking?
No. Google-Extended only governs whether your content flows into Google's generative-AI training corpus. Google Search ranking and AI Overview citation continue to operate from the same Google Search index regardless of the Google-Extended setting. The two are explicitly decoupled by design.
Why isn't there a Google-Extended User-Agent in my logs?
Because Google-Extended is not a separate crawler — it's a robots.txt policy token. Googlebot does the fetching; Google-Extended controls whether the fetched content can flow into training pipelines after the crawl. There's no separate HTTP request to log.
Should publishers block Google-Extended?
Trade-off. Blocking removes content from Gemini training without affecting Search traffic. For publishers concerned about uncompensated AI training, blocking is consistent. For brands optimizing for compounding AI knowledge of their products, allowing is consistent. There is no single correct answer — the control exists specifically to let site owners pick.
Related
Stop guessing where you rank in AI search
Citare measures citation rate and share of voice across ChatGPT, Google AI Overview, Gemini, Claude, and Perplexity — weekly, for your priority queries. Free forever tier.