Structured data for AI search
Structured data is the single most under-deployed lever for AI search visibility. AI platforms read JSON-LD preferentially over body text when extracting facts for citation. The leverage gap is large; the cost to close it is small. This is the complete JSON-LD reference — every schema that materially affects AI citation, with production-ready code, validation tools, and the seven mistakes that produce broken or sparse schema in production.
Updated May 2026
TL;DR
- 1.AI platforms prefer JSON-LD over body text for fact extraction. A page with comprehensive JSON-LD is materially more citable than the same page without it.
- 2.Use JSON-LD (not Microdata, not RDFa). Google explicitly prefers it; AI platforms parse it most reliably; it's decoupled from visible markup.
- 3.Tier 1 (everyone deploys): Organization, Article, FAQPage, WebSite. FAQPage is the single highest individual citation lever; Organization+sameAs is the entity-recognition foundation.
- 4.Validate every priority page in Google Rich Results Test + Schema Markup Validator before deploy. Schema-doesn't-match- visible-content is now an active negative signal.
What JSON-LD is and why AI platforms prefer it
JSON-LD is JSON for Linking Data — a JSON-based format for embedding structured data in web pages, standardized by W3C. The format is built on schema.org, a vendor-neutral vocabulary jointly developed by Google, Microsoft, Yahoo, and Yandex.
JSON-LD lives inside <script type="application/ld+json"> tags in your HTML, typically in the <head>. It does not affect human-visible page rendering. It exists exclusively for machine consumption.
AI platforms — Google AI Overview, Gemini, ChatGPT, Claude, Perplexity — face an extraction problem when generating answers. From any page: what is the canonical brand name? The price? Who is the author? When was it last updated? Body text is ambiguous. JSON-LD is unambiguous. When a page provides both, AI platforms prefer JSON-LD for factual extraction. Citation context becomes more reliable when the model extracts structured claims rather than parsing prose.
Schema priority — what to deploy in order
Not all schemas have equal citation leverage. Deploy in tier order; measure lift before moving to the next tier.
Production-ready code samples
Copy-pasteable JSON-LD for the highest-leverage schemas. Replace example values; validate in Rich Results Test; ship.
Organization schema
Homepage · highest entity-recognition leverage
Tells AI platforms what your brand entity is. Powers Knowledge Graph, sameAs disambiguation, and canonical reference identity across every AI platform. The sameAs array is the single highest-leverage property — aim for 5-10 canonical references (LinkedIn, Crunchbase, Wikipedia where applicable, official social channels).
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Citare",
"alternateName": "Citare AI",
"url": "https://citare.ai",
"logo": "https://citare.ai/logo.png",
"description": "AI search visibility platform measuring brand presence across ChatGPT, Gemini, Claude, Perplexity, and Google AI Overview.",
"foundingDate": "2024",
"founders": [
{ "@type": "Person", "name": "Ravi RDP" }
],
"address": {
"@type": "PostalAddress",
"addressLocality": "Bangalore",
"addressRegion": "KA",
"addressCountry": "IN"
},
"contactPoint": {
"@type": "ContactPoint",
"contactType": "customer support",
"email": "support@citare.ai",
"availableLanguage": ["English"]
},
"areaServed": "Worldwide",
"sameAs": [
"https://www.linkedin.com/company/citare-ai",
"https://twitter.com/citare_ai",
"https://www.crunchbase.com/organization/citare",
"https://github.com/citare-ai"
]
}Watch out: Common mistakes: sparse properties (only name + url); logo URL pointing to a non-public CDN; address without addressCountry; sameAs containing redirects or stale URLs.
Article schema
Every long-form page
Applies to blog posts, guides, research, and editorial content. The dateModified property is the freshness signal AIO uses for citation weighting — pages with recent dateModified cite at higher rates. Use Article as the default; BlogPosting only if your CMS or audience expects that framing; NewsArticle only for genuine news (it has stricter rich-result rules).
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "How to Measure AI Search Visibility",
"description": "A complete framework for measuring AI search visibility — query design, persona dispatch, citation parsing, surface rate, competitor benchmarking.",
"image": ["https://citare.ai/guides/measure-ai-search-visibility/hero.png"],
"author": {
"@type": "Person",
"name": "Ravi RDP",
"url": "https://citare.ai/team/ravi"
},
"publisher": {
"@type": "Organization",
"name": "Citare",
"url": "https://citare.ai",
"logo": {
"@type": "ImageObject",
"url": "https://citare.ai/logo.png"
}
},
"datePublished": "2026-05-04",
"dateModified": "2026-05-20",
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://citare.ai/guides/measure-ai-search-visibility"
},
"keywords": "AI search visibility measurement, surface rate, persona dispatch"
}Watch out: Update dateModified whenever you make non-trivial revisions. Don't game it — Google penalizes dateModified updates without real content changes.
FAQPage schema
Highest single AIO citation lift
The single schema with the largest measurable lift on AIO citation rate. AI platforms extract question-answer pairs directly from FAQPage mainEntity into their generated answers. Effective FAQ content has three properties: question phrasing matches conversational queries (not SEO target lists), answers are direct and self-contained, answers are factually dense with specific claims.
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is Generative Engine Optimization (GEO)?",
"acceptedAnswer": {
"@type": "Answer",
"text": "GEO is the practice of optimizing your brand and content to be cited or recommended by AI-powered search platforms including Google AI Overview, ChatGPT, Gemini, Claude, and Perplexity."
}
},
{
"@type": "Question",
"name": "How is GEO different from SEO?",
"acceptedAnswer": {
"@type": "Answer",
"text": "SEO optimizes pages to rank in Google's link-based results. GEO optimizes content to be cited in AI-generated answers. Ranking logic, measurement methodology, and optimization tactics are structurally different."
}
}
]
}Watch out: Schema FAQ MUST match visible page content — Google penalizes invisible-FAQ tricks. Aim for 8-15 questions per FAQPage on content-heavy guides. Front-load the answer; AIO extracts the first 1-3 sentences typically.
Product schema
Ecommerce — every product page
Powers ecommerce AI citation in 'best of', 'top', and recommendation queries. Universal identifiers (sku, gtin, mpn, isbn) help AI platforms disambiguate products. aggregateRating + review are heavily cited as social proof signals in recommendation queries.
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Citare Pro Plan",
"description": "AI search visibility monitoring across ChatGPT, Gemini, Claude, Perplexity, and Google AI Overview with persona-anchored dispatch and competitor benchmarking.",
"image": ["https://citare.ai/products/pro/hero.png"],
"brand": { "@type": "Brand", "name": "Citare" },
"sku": "CITARE-PRO-MONTHLY",
"offers": {
"@type": "Offer",
"url": "https://citare.ai/pricing",
"priceCurrency": "USD",
"price": "119.00",
"priceValidUntil": "2026-12-31",
"availability": "https://schema.org/InStock",
"seller": { "@type": "Organization", "name": "Citare" }
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.7",
"reviewCount": "42"
}
}Watch out: For multi-variant products (size/color/configuration), use ProductGroup with hasVariant arrays of individual Product items. Do not collapse variants into a single Product if they have different prices or SKUs.
LocalBusiness schema
Physical-location pages
Powers AIO geo-contextualization (city-level brand citation) and Gemini local query handling. Required for any brand with physical presence. Multi-location brands should deploy distinct LocalBusiness schema on each location's landing page — don't collapse all locations into homepage Organization schema, that dilutes geo-specific signals.
{
"@context": "https://schema.org",
"@type": "LocalBusiness",
"name": "Citare HQ",
"image": "https://citare.ai/locations/bangalore.jpg",
"address": {
"@type": "PostalAddress",
"streetAddress": "123 MG Road",
"addressLocality": "Bangalore",
"addressRegion": "KA",
"postalCode": "560001",
"addressCountry": "IN"
},
"geo": {
"@type": "GeoCoordinates",
"latitude": 12.9716,
"longitude": 77.5946
},
"openingHoursSpecification": [
{
"@type": "OpeningHoursSpecification",
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
"opens": "09:00",
"closes": "18:00"
}
],
"telephone": "+91-80-1234-5678",
"priceRange": "$$",
"url": "https://citare.ai/locations/bangalore",
"areaServed": "India"
}Watch out: Include geo coordinates + openingHoursSpecification for full AIO eligibility on 'near me' and 'open now' queries. priceRange uses $ symbols ($ to $$$$).
HowTo schema
Procedural content
Highly citable structure for 'how to X' queries on AIO and Perplexity. Rewards explicit step structure with names, descriptions, and optional URLs anchoring each step to a section of the page. totalTime in ISO 8601 duration format (PT15M = 15 minutes).
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "How to Configure robots.txt for AI Crawlers",
"description": "Step-by-step guide to allowing GPTBot, ClaudeBot, PerplexityBot, and Google-Extended in robots.txt for AI search visibility.",
"totalTime": "PT15M",
"tool": [
{ "@type": "HowToTool", "name": "Text editor" },
{ "@type": "HowToTool", "name": "FTP or file system access to web root" }
],
"step": [
{
"@type": "HowToStep",
"name": "Open your robots.txt file",
"text": "Locate robots.txt at the root of your web server. If none exists, create one.",
"url": "https://citare.ai/ai-bot-crawlers#step-1"
},
{
"@type": "HowToStep",
"name": "Add the AI crawler allow list",
"text": "Add named-bot allow rules for Googlebot, Google-Extended, Bingbot, GPTBot, ClaudeBot, and PerplexityBot.",
"url": "https://citare.ai/ai-bot-crawlers#step-2"
}
]
}Validation tools
Use both validators on every priority page before deploy. Google's tool is the practical one for rich-result eligibility; the schema.org validator catches issues Google's may miss.
Seven mistakes that produce broken or sparse schema
Implementation patterns
The same JSON-LD can be embedded several ways. Pick the one that matches your stack.
Frequently asked questions
Why do AI platforms prefer JSON-LD over body text?
AI platforms face an extraction problem when generating answers. From any given page: what is the canonical brand name? What is the price? Who is the author? When was it last updated? Body text is ambiguous; JSON-LD is unambiguous. When a page provides both, AI platforms prefer JSON-LD for factual extraction — the citation context becomes more reliable when the model extracts structured claims rather than parsing prose.
Should I use JSON-LD, Microdata, or RDFa?
JSON-LD. Google explicitly prefers it; AI platforms parse it most reliably; it's decoupled from visible HTML structure so design changes don't break your structured data. Microdata and RDFa are legacy — they work but are not the recommended path in 2026.
Should I add schema to every page or only key pages?
Every page that serves a clear purpose should have schema appropriate to its content type. Homepage → Organization + WebSite. Article pages → Article. Product pages → Product. FAQ pages → FAQPage. Plain pages → WebPage. Schema-on-every-page is the goal state. There is no penalty for legitimate schema across many pages.
Is there a penalty for too much schema?
No penalty for legitimate schema across many pages. There IS penalty for schema that doesn't match visible content, schema with placeholder values, or duplicate / conflicting schemas on the same page. Quality and accuracy matter, not quantity.
How long until Google picks up new schema?
24-72 hours for Googlebot to recrawl and notice schema changes. Rich Results Test reflects new schema within hours of deploy. AI Overview citation lift from new schema typically registers in 4-8 weeks as Google's relevance evaluation cycles.
Can I use multiple schema types on one page?
Yes. A blog post might carry Article + Person (author) + Organization (publisher) + FAQPage (Q&A section) + BreadcrumbList — all in one HTML file. Use multiple <script type='application/ld+json'> blocks or combine into a single @graph array. Both work; @graph is preferred for pages with many types.
Do AI crawlers use the same schema as Google?
Effectively yes. AI platforms read the same schema.org JSON-LD that Google uses for rich results. Some platforms (notably Perplexity) are adding proprietary extensions, but the core schema.org vocabulary is universal. Deploy standard schema.org first; add proprietary extensions only where they unlock specific platform features.
What's the difference between schema.org and JSON-LD?
Schema.org is the vocabulary — the dictionary of types and properties (Organization, Article, Product, etc.). JSON-LD is the format used to express that vocabulary in JSON syntax embedded in HTML. Schema.org defines what 'Organization' means; JSON-LD specifies how to write it in your HTML. They are complementary, not alternatives.
Generate or inspect schema in 30 seconds
Citare's free tools cover the JSON-LD workflow end-to-end. Generate schema from a form, inspect what your competitors deploy, score your AI citation readiness, and check structured data coverage — all without a signup.
Related
GEO — the complete 2026 guide
The pillar page
Rank in Google AI Overview
FAQPage + Article are actions #3 + #6 of the AIO 9-step list
AI bot crawlers
Crawler access is the prerequisite to schema mattering
The four-index reality
Why schema works across all 5 AI surfaces
JSON-LD generator (free)
Form-driven builder for Organization, Article, FAQPage, Product
JSON-LD inspector (free)
Paste a URL, see every JSON-LD block on the page