Will blocking Google-Extended hurt my Google Search ranking?

No. Google-Extended only governs whether your content flows into Google's generative-AI training corpus. Google Search ranking and AI Overview citation continue to operate from the same Google Search index regardless of the Google-Extended setting. The two are explicitly decoupled by design.

Why isn't there a Google-Extended User-Agent in my logs?

Because Google-Extended is not a separate crawler — it's a robots.txt policy token. Googlebot does the fetching; Google-Extended controls whether the fetched content can flow into training pipelines after the crawl. There's no separate HTTP request to log.

Should publishers block Google-Extended?

Trade-off. Blocking removes content from Gemini training without affecting Search traffic. For publishers concerned about uncompensated AI training, blocking is consistent. For brands optimizing for compounding AI knowledge of their products, allowing is consistent. There is no single correct answer — the control exists specifically to let site owners pick.

Google-Extended — definition and meaning · Citare glossary

Definition

Google-Extended is not a separate web crawler — it's a robots.txt control token Google introduced in 2023 that lets site owners opt their content out of generative-AI training corpora (Gemini, Vertex AI Search) while keeping it available for Google Search ranking. There is no Google-Extended User-Agent in HTTP request logs; the control operates as a policy gate Googlebot applies internally when it decides whether crawled content can flow into training pipelines.

Why it matters

Google's content acquisition is single-pipeline. The same Googlebot crawl that builds the Google Search index also feeds Gemini training and AI Overview composition. Without Google-Extended, opting out of AI training meant blocking Googlebot entirely — losing classic SEO traffic along with it. Google-Extended decouples the two: keep Search ranking, opt out of generative-AI training. Per Google's 2026-05-15 AI Optimization Guide, Google-Extended specifically governs training, not AIO ranking — AIO continues to source from the Google Search index regardless of the Google-Extended setting.

How to use it

In robots.txt:

User-Agent: Google-Extended
Disallow: /

To allow AI training (the default if no directive present):

User-Agent: Google-Extended
Allow: /

Path-level scoping works the same as Googlebot directives — disallow specific subtrees, allow the rest.

What it does and does not affect

Affects: content inclusion in Gemini fine-tuning, Vertex AI Search retrieval indexes, Bard/Gemini training corpus
Does not affect: Google Search blue-link ranking, AI Overview citation eligibility, Google Discover, Knowledge Graph population

This separation is the entire point of the control. A site that blocks Google-Extended remains fully ranked in classic Search and AIO; it's only excluded from the long-term training corpus.

Privacy + content licensing implications

For publishers, Google-Extended is the practical lever for separating SEO economic value (Search traffic) from AI-training contribution (zero direct return). News publishers, paywalled content sites, and creator-owned platforms increasingly disallow Google-Extended while allowing Googlebot — declaring the asymmetry that crawls for training should not be free when crawls for search drive traffic that monetizes.

For most marketing sites, allowing Google-Extended is the default choice. Pre-training inclusion influences baseline Gemini knowledge of your brand, which compounds across model versions.

Common pitfalls

Confusing Google-Extended with GPTBot. GPTBot is a real crawler with its own User-Agent and IP range. Google-Extended is a robots.txt policy token only. Blocking GPTBot does not affect Google AI; blocking Google-Extended does not affect OpenAI.
Path-mismatch with Googlebot. If Googlebot can't reach a path, Google-Extended access to that path is moot — there's nothing to gate.
No User-Agent in logs. Don't grep server logs for Google-Extended. There is no such request signature. Verification is robots.txt-only.

See /ai-bot-crawlers for the full control matrix.

Google-Extended

Definition

Why it matters

How to use it

What it does and does not affect

Privacy + content licensing implications

Common pitfalls

Frequently asked

Related

Stop guessing where you rank in AI search