Structured data: essential for indexing by LLMs
Description
Structured data is a standardized format for providing information about a web page and classifying its content. It helps search engines understand the meaning and context of the information on your site. Using the Schema.org vocabulary, you can mark up specific elements of your content (e.g., an article, a product, a recipe, a local business) in a way that makes it easier for search engines to interpret.
This improved understanding allows search engines to display rich results, but it's also crucial for generative AI engines that rely on this data to generate accurate and contextual answers.
Why is this important for ai search?
Structured data provides LLMs with a rich semantic context that significantly improves content understanding. It allows models to accurately identify entities, their relationships, and their attributes, resulting in more accurate and contextualized citations.
This structuring also facilitates the integration of content into knowledge graphs used by AI systems to generate coherent answers.
Technical details
- Structured Data Formats
- Recommended Schema Types
- Structured Data Validation
- Content Alignment and Consistency
- Multilingual Consistency
1. Structured Data Formats
Schema.org can be implemented using different formats on your HTML page. The three main formats are JSON-LD, Microdata, and RDFa. For GEO optimization, JSON-LD is generally preferred for its ease of implementation and clarity.
JSON-LD (JavaScript Object Notation for Linked Data) is the format recommended by Google. It is a block of JavaScript code inserted into the <head> or <body> of your HTML page. It is easy to generate, read, and maintain because it does not modify the visible HTML code of the page. Example:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Your company name",
"url": "https://www.yourdomain.com",
"logo": "https://www.yourdomain.com/images/logo.png"
}
</script>
Always use JSON-LD to implement your structured data. It is more flexible and less intrusive for HTML code.
Microdata: This format integrates Schema.org tags directly into existing HTML code using attributes (itemscope, itemtype, itemprop). Although it is semantically linked to the visible content, it can make the HTML code heavier and less readable. Example:
<div itemscope itemtype="https://schema.org/Product">
<h1 itemprop="name">Product Name</h1>
<img itemprop="image" src="product.jpg" alt="Product Image">
<p itemprop="description">Product Description.</p>
</div>
Only use Microdata if you have specific technical constraints or are working on an existing system that already uses it. Otherwise, prefer JSON-LD.
RDFa (Resource Description Framework in Attributes):is similar to Microdata, RDFa is also integrated directly into the HTML via attributes (vocab, typeof, property). It is more complex to use than JSON-LD and is less commonly adopted for SEO. Example:
<div vocab="https://schema.org/" typeof="Product">
<h1 property="name">Product Name</h1>
<img property="image" src="product.jpg" alt="Product Image">
<p property="description">Product Description.</p>
</div>
Avoid RDFa unless you have a specific reason to use it (e.g., compatibility with existing systems that require it).
2. Recommended Schema Types
Using relevant and diverse schema types is essential to provide search engines with a complete understanding of your content. Here are some of the most commonly recommended schema types for GEO optimization:
- WebSite: Represents your website as a whole. It can include properties like name, url, and potentialAction (for internal site search).
- Organization: Describes your business or organization, including its name, logo, contact information, social media profiles, and address. Crucial for Brand Authority and the Knowledge Graph.
- LocalBusiness: For businesses with a physical presence, this schema provides specific details like address, opening hours, phone number, and customer reviews.
- Article: For blog posts, news, or any editorial content. Includes properties like headline, image, datePublished, author, and publisher.
- Product: For e-commerce product pages. Allows you to specify the product name, description, price, availability, reviews, and offers.
- FAQPage: For pages containing a list of questions and answers. Each question and its associated answer can be marked up, which can generate rich snippets in SERPs.
- HowTo: For content that describes a series of steps to complete a task. Can appear as Rich Snippets with detailed instructions.
- VideoObject: For videos embedded on your site. Allows you to specify the video's title, description, thumbnail, duration, and publication date.
Identify the main content types on your site and implement the most relevant Schema.org schemas. Feel free to combine multiple schema types on the same page if it's semantically appropriate (e.g., an Article that contains a VideoObject).
3. Validating Structured Data
Once structured data is implemented, it is imperative to validate it to ensure it is correctly formatted and error-free. Errors can prevent search engines from understanding your data, thus negating all implementation efforts.
The primary tool for validating your structured data is the Google Rich Results Test. This tool allows you to test a URL or code snippet and see what types of rich results Google can generate from your data. It also reports errors and warnings.
Systematically test all pages where structured data is implemented. Fix any reported errors and carefully review any warnings. A valid schema is the first step in ensuring your data is used by search engines.
4. Content Alignment and Consistency
Structured data must accurately reflect the visible content of your page. It is crucial that the marked-up information matches what the user sees and reads on the page. Any inconsistency can be perceived as an attempt at manipulation and lead to penalties or your data being ignored by search engines.
Make sure that each marked-up property in your structured data has a visible and relevant equivalent on the page. For example, if you mark up a price, that price should be clearly displayed on the page. Avoid marking up information that is not present or is misleading.
It is possible for schema conflicts or duplicates to appear, especially on complex sites or when using plugins or themes that automatically generate structured data. These issues can prevent search engines from correctly understanding your data.
Perform regular audits of your structured data using Google's Rich Results Test and Google Search Console. If duplicates or conflicts are detected (for example, two Organization tags for the same entity), identify the source and remove redundancies or conflicting information. Ensure there is only one instance of each primary schema type per entity on a page.
5. Multilingual Consistency
For multilingual sites, consistency of structured data across different language versions is paramount. Structured data should reflect the language and region targeted by each version of the page.
Ensure that text properties in your structured data are translated into the language of the corresponding page. Use inLanguage attributes if necessary to specify the language of the content. If you use hreflang tags, ensure that canonical URLs and URLs in structured data are consistent with the hreflang configuration.
Resources & useful links
Be the answer in AI search!
Boostez votre visibilité dans les résultats de recherche IA
ChatGPT, Perplexity, Gemini, Mistal, Claude...
