
Guide 105
Structured Data and JSON-LD for AI Search: A Complete Reference
Complete JSON-LD reference for AI search visibility — Organization, Article, Product, LocalBusiness, HowTo schemas with full code samples and validation.
Last updated: May 2026
Structured data is the single most under-deployed lever for AI search visibility. Most brands have either zero JSON-LD or sparse JSON-LD copied from a generic SEO checklist a developer ran through five years ago. AI platforms read structured data preferentially over body text when extracting facts for citation. The leverage gap is huge — and the cost to close it is small.
This is the complete JSON-LD reference for AI search optimization. Every schema type that materially affects AI citation, with full code samples, validation patterns, and the common mistakes that produce sparse-or-broken schema in production. If you need the conceptual framing, Google AI Overview Optimization is the parent pillar. This is the developer's reference.
What JSON-LD Is and Why AI Platforms Prefer It
JSON-LD is JSON for Linking Data — a JSON-based format for embedding structured data in web pages, standardized by W3C. The format is built on schema.org, a vendor-neutral vocabulary jointly developed by Google, Microsoft, Yahoo, and Yandex.
JSON-LD lives inside <script type="application/ld+json"> tags in your HTML, typically in the <head>. It does not affect human-visible page rendering. It exists exclusively for machine consumption.
Why AI platforms read JSON-LD before body text
AI platforms — Google AI Overview, Gemini, ChatGPT, Perplexity — face an extraction problem when generating answers. From any given page, what is the canonical brand name? What is the price? Who is the author? When was it last updated? Body text is ambiguous. JSON-LD is unambiguous.
When a page provides both, AI platforms prefer JSON-LD for factual extraction. The citation context becomes more reliable when the model can extract structured claims rather than parse prose.
This is why a page with comprehensive JSON-LD is more citable than the same page with the same body content but no JSON-LD. The body content alone leaves the AI with extraction uncertainty. The JSON-LD removes that uncertainty.
Three formats: JSON-LD vs Microdata vs RDFa
Schema.org markup can be expressed in three formats:
- JSON-LD — JSON-based, embedded in
<script>tags, decoupled from HTML structure - Microdata — HTML attributes (
itemscope,itemtype,itemprop) embedded inline in the rendered HTML - RDFa — similar to Microdata but uses RDF attribute syntax
JSON-LD is the recommended format. Google explicitly prefers it. AI platforms parse it most reliably. It is decoupled from the visible markup, which means changes to your design do not break your structured data. Use JSON-LD; the other two are legacy.
The Schema Priority List — What to Deploy in Order
Not all schemas have equal leverage. Deploy in this order:
Tier 1 — Everyone deploys these
- `Organization` on the homepage — the single highest-impact schema for entity recognition
- `Article` on every long-form content page — blog posts, guides, research, documentation
- `FAQPage` anywhere you have Q&A content — the highest-leverage AIO citation lever (sibling: FAQ Schema for AI Visibility)
- `WebSite` with
SearchActionon the homepage — enables Google's site-search rich result
Tier 2 — Most brands deploy these
- `WebPage` as the wrapper for non-Article pages
- `BreadcrumbList` for navigation hierarchy on every page beyond the homepage
- `Person` for author bylines (linked from
Article'sauthorfield) - `Product` on every ecommerce product page
- `LocalBusiness` on every page representing a physical location
Tier 3 — Use-case specific
- `HowTo` for procedural / step-by-step content
- `Recipe` for recipe content
- `Event` for events with date/location/registration
- `JobPosting` for hiring pages
- `SoftwareApplication` for SaaS product pages
- `Service` for service offering pages
- `Review` and `AggregateRating` for review content
Tier 1 is mandatory. Tier 2 is strongly recommended for most brands. Tier 3 is deploy-when-relevant. Below is the implementation reference for the highest-leverage schemas.
Organization Schema — The Homepage Foundation
Organization is the schema that tells AI platforms what your brand entity is. It powers Knowledge Graph entity recognition, sameAs link resolution, and the canonical reference identity used across all AI platforms.
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Citare",
"alternateName": "Citare AI",
"url": "https://citare.ai",
"logo": "https://citare.ai/logo.png",
"description": "AI search visibility platform measuring brand presence across ChatGPT, Gemini, Perplexity, and Google AI Overview.",
"foundingDate": "2024",
"founders": [
{
"@type": "Person",
"name": "Ravi RDP"
}
],
"address": {
"@type": "PostalAddress",
"addressLocality": "Bangalore",
"addressRegion": "KA",
"addressCountry": "IN"
},
"contactPoint": {
"@type": "ContactPoint",
"contactType": "customer support",
"email": "support@citare.ai",
"availableLanguage": ["English"]
},
"areaServed": "Worldwide",
"sameAs": [
"https://www.linkedin.com/company/citare-ai",
"https://twitter.com/citare_ai",
"https://www.crunchbase.com/organization/citare",
"https://github.com/citare-ai"
]
}The sameAs array — highest-leverage single property
sameAs is an array of canonical entity references — the URLs that also identify the same organization. Wikipedia, LinkedIn, Crunchbase, Twitter, GitHub, official social channels, industry directories.
sameAs is the property that resolves entity ambiguity for AI platforms. Two companies named "Acme" are disambiguated by their sameAs arrays. A complete sameAs array improves entity recognition across Knowledge Graph, AIO, and Gemini.
Aim for 5-10 entries. Add Wikipedia if applicable. Always include LinkedIn and Crunchbase. Never include broken or stale URLs.
Common Organization schema mistakes
- Sparse properties — only
nameandurl, missing description / logo / sameAs / foundingDate - Logo URL pointing to a non-public CDN
- Email field with
mailto:prefix (should be plain email) - Address without
addressCountry sameAscontaining redirects rather than canonical URLs- Foundering date in inconsistent format (use ISO 8601:
"2024"or"2024-01")
Article Schema — Every Long-Form Page
Article schema applies to blog posts, guides, research, and any long-form editorial content. It tells AI platforms who wrote the page, when it was last updated, and what topic it covers.
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "How to Measure AI Search Visibility: A Complete Framework",
"description": "A complete framework for measuring AI search visibility — query design, persona dispatch, citation parsing, surface rate, competitor benchmarking.",
"image": [
"https://citare.ai/guides/measure-ai-search-visibility/hero.png"
],
"author": {
"@type": "Person",
"name": "Ravi RDP",
"url": "https://citare.ai/team/ravi"
},
"publisher": {
"@type": "Organization",
"name": "Citare",
"url": "https://citare.ai",
"logo": {
"@type": "ImageObject",
"url": "https://citare.ai/logo.png"
}
},
"datePublished": "2026-05-04",
"dateModified": "2026-05-04",
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://citare.ai/guides/measure-ai-search-visibility"
},
"keywords": "AI search visibility measurement, AI search monitoring tool, surface rate, persona dispatch"
}Why dateModified is a citation signal
AI platforms — particularly Google AI Overview — use dateModified to evaluate freshness. Pages with recent dateModified get cited at higher rates than identical pages with stale dateModified. Update this field whenever you make non-trivial revisions. It is the cheapest freshness signal available and a direct citation lift mechanism. (See Google AI Overview Optimization for the freshness model.)
Article vs BlogPosting vs NewsArticle
- `Article` — generic; use for evergreen reference content like guides
- `BlogPosting` — for blog posts specifically
- `NewsArticle` — for news with strict date sensitivity
Use Article as the default. Use BlogPosting if your content management system or audience expects a blog framing. Use NewsArticle only for genuine news content — Google has specific rich-result eligibility for NewsArticle that does not apply to evergreen guides.
FAQPage Schema — The Highest-Leverage AIO Lever
FAQPage is the single schema with the largest measurable lift on AIO citation rate. It signals to AI platforms "this page contains explicit question-answer pairs ready for citation."
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is Generative Engine Optimization (GEO)?",
"acceptedAnswer": {
"@type": "Answer",
"text": "GEO is the practice of optimizing your brand and content to be cited or recommended by AI-powered search platforms including Google AI Overview, ChatGPT, Gemini, and Perplexity."
}
},
{
"@type": "Question",
"name": "How is GEO different from SEO?",
"acceptedAnswer": {
"@type": "Answer",
"text": "SEO optimizes pages to rank in Google's link-based results. GEO optimizes content to be cited in AI-generated answers. The ranking logic, measurement, and optimization tactics are structurally different."
}
}
]
}For the deep design guide on FAQ content for AI citation — question phrasing patterns, answer length, factual density, common mistakes — see the sibling cluster FAQ Schema for AI Visibility.
Product Schema — Ecommerce
Product schema is the schema that powers ecommerce AI citation. Every product page should have it.
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Citare Pro Plan",
"description": "Citare Pro — AI search visibility monitoring across ChatGPT, Gemini, Perplexity, and Google AI Overview, with persona-anchored dispatch and competitor benchmarking.",
"image": [
"https://citare.ai/products/pro/hero.png"
],
"brand": {
"@type": "Brand",
"name": "Citare"
},
"sku": "CITARE-PRO-MONTHLY",
"offers": {
"@type": "Offer",
"url": "https://citare.ai/pricing",
"priceCurrency": "USD",
"price": "299.00",
"priceValidUntil": "2026-12-31",
"availability": "https://schema.org/InStock",
"seller": {
"@type": "Organization",
"name": "Citare"
}
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.7",
"reviewCount": "42"
}
}Properties that materially affect AI citation
name,description,image— required basicsbrand— links the product to the brand entity (Organization schema cross-reference)sku,gtin,mpn,isbn— universal identifiers; AI platforms use these for product disambiguationoffers— current price and availability; freshness matters here tooaggregateRatingandreview— social proof signals; cited heavily by AI in recommendation queries
Multi-variant products
For products with size/color/configuration variants, use ProductGroup with hasVariant arrays of individual Product items. Do not collapse variants into a single Product if they have different prices or SKUs.
LocalBusiness Schema — Physical Presence
For any brand with physical locations, LocalBusiness schema is mandatory. It powers AIO geo-contextualization (covered in P5) and Gemini's local query handling.
{
"@context": "https://schema.org",
"@type": "LocalBusiness",
"name": "Citare HQ",
"image": "https://citare.ai/locations/bangalore.jpg",
"address": {
"@type": "PostalAddress",
"streetAddress": "123 MG Road",
"addressLocality": "Bangalore",
"addressRegion": "KA",
"postalCode": "560001",
"addressCountry": "IN"
},
"geo": {
"@type": "GeoCoordinates",
"latitude": 12.9716,
"longitude": 77.5946
},
"openingHoursSpecification": [
{
"@type": "OpeningHoursSpecification",
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
"opens": "09:00",
"closes": "18:00"
}
],
"telephone": "+91-80-1234-5678",
"priceRange": "$$",
"url": "https://citare.ai/locations/bangalore",
"areaServed": "India"
}Multi-location strategy
For brands with multiple physical locations, deploy distinct LocalBusiness schema on each location's landing page. Do not collapse all locations into the homepage Organization schema — that dilutes geo-specific signals and reduces city-level AIO citation.
HowTo Schema — Procedural Content
For step-by-step content, HowTo schema produces highly citable structure for "how to X" queries on AIO and Perplexity.
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "How to Configure robots.txt for AI Crawlers",
"description": "Step-by-step guide to allowing GPTBot, ClaudeBot, PerplexityBot, and Google-Extended in robots.txt for AI search visibility.",
"totalTime": "PT15M",
"tool": [
{ "@type": "HowToTool", "name": "Text editor" },
{ "@type": "HowToTool", "name": "FTP or file system access to web root" }
],
"step": [
{
"@type": "HowToStep",
"name": "Open your robots.txt file",
"text": "Locate robots.txt at the root of your web server. If none exists, create one.",
"url": "https://citare.ai/guides/ai-crawler-access-guide#step-1"
},
{
"@type": "HowToStep",
"name": "Add the AI crawler allow list",
"text": "Add named-bot allow rules for Googlebot, Google-Extended, Bingbot, GPTBot, ClaudeBot, and PerplexityBot.",
"url": "https://citare.ai/guides/ai-crawler-access-guide#step-2"
}
]
}HowTo rewards explicit step structure with names, descriptions, and optional URLs anchoring each step to a section of the page.
Validation, Testing, and Common Mistakes
Validation tools
- Google Rich Results Test (search.google.com/test/rich-results) — tests for Google rich-result eligibility specifically. Validates schema and shows preview.
- Schema Markup Validator (validator.schema.org) — vendor-neutral validation against schema.org standards. More lenient than Google's tool.
- Google Search Console (search.google.com/search-console) — once your site is indexed, GSC reports schema errors and warnings across all pages.
Use both validators on every priority page. Google's tool is the practical one; the schema.org validator catches issues Google's may miss.
Common mistakes that produce broken or sparse schema
1. Schema doesn't match visible content. Google explicitly penalizes this. If your FAQPage schema lists 10 questions but the visible page only shows 3, you fail validation. Schema and visible content must agree.
2. Missing required properties. Schema.org defines required properties for each type. Organization requires name and url. Article requires headline, image, author, datePublished, publisher. Validators flag missing requireds.
3. Wrong nesting. Article.author should nest a Person object, not a string. Product.offers.seller should nest an Organization. Flat string values where objects are expected break validation.
4. Stale `datePublished` and `dateModified`. Update dateModified whenever the page changes. Stale dates depress citation.
5. Generic placeholder values. "Brand Name" or "Lorem Ipsum" left in production schema is the most common embarrassment. Audit before deploy.
6. Multiple conflicting schemas. A page with three different Article schema blocks tells AI platforms nothing. Consolidate into one block per type per page.
7. Schema in `<body>` instead of `<head>`. JSON-LD works in either, but <head> is the convention and slightly preferred. Be consistent.
Implementation Patterns
Static HTML
Embed JSON-LD directly in the HTML <head>:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Brand Name"
}
</script>Next.js / React (dynamic)
Render JSON-LD via dangerouslySetInnerHTML in a layout or page component:
<script
type="application/ld+json"
dangerouslySetInnerHTML={{
__html: JSON.stringify({
"@context": "https://schema.org",
"@type": "Article",
"headline": post.title,
"datePublished": post.publishedAt,
"dateModified": post.updatedAt,
}),
}}
/>CMS-driven sites
Major CMSes have schema plugins:
- WordPress — Yoast SEO, Rank Math, Schema Pro
- Sanity — custom Studio plugins or rendered server-side at build
- Contentful — render via your application layer
- Webflow — embed via the custom-code field
Plugin-driven schema is convenient but check the output. Plugins frequently produce minimal schema (just type and name); you may need to add properties manually.
Frequently Asked Questions
Should I add schema only to my homepage or every page?
Every page that serves a clear purpose should have schema appropriate to its content type. Homepage gets Organization. Article pages get Article. Product pages get Product. FAQ pages get FAQPage. Plain content pages get WebPage. Schema-on-every-page is the goal state.
Is there a penalty for too much schema?
There is no penalty for legitimate schema across many pages. There IS penalty for schema that doesn't match visible content, schema with placeholder values, or duplicate / conflicting schemas on the same page. Quality, not quantity.
How long until Google picks up new schema?
Within 24-72 hours for Googlebot to recrawl and notice. Visibility in Rich Results Test is faster — within hours of deploy. AI Overview citation lift from new schema typically shows up in 4-8 weeks as Google's relevance evaluation cycles.
Can I use multiple schema types on one page?
Yes. A blog post might have Article + Person (author) + Organization (publisher) + FAQPage (Q&A section) all in one HTML file. Use multiple <script> blocks or combine into a single @graph array. Both work.
Should I include schema even if AI platforms haven't cited me yet?
Yes, immediately. Schema is the prerequisite for AI citation eligibility, not a result of it. Pages without schema have a much harder time clearing AI platforms' citation thresholds. Deploy schema first; expect citation lift in the following weeks.
What's the difference between schema.org and JSON-LD?
Schema.org is the vocabulary — the dictionary of types and properties. JSON-LD is the format used to express that vocabulary in JSON syntax. Schema.org defines what Organization and Article mean. JSON-LD specifies the syntax for embedding them in HTML. They are complementary, not alternatives.
Do AI crawlers use the same schema as Google?
Effectively yes. AI platforms read the same schema.org JSON-LD that Google uses for rich results. Some platforms are starting to add proprietary extensions, but the core schema.org vocabulary is universal.
Deploy Schema, Then Measure the Lift
Schema deployment is one of the highest-leverage AI search investments. Citare measures the impact — running surface rate audits before and after schema deployment, isolating which interventions produce which lift.
Run your free AI visibility audit → [citare.ai/audit]
See what AI says about your brand
Citare measures your surface rate across ChatGPT, Gemini, Perplexity, and Google AI Overview — and tells you exactly what to fix.
Run your free AI visibility audit →