What 68.9 million AI crawler visits tell us about AI search visibility

Published:
April 22, 2026
Author:
Florian Chapelier

AI search visibility is starting to look less mysterious. Across 858,457 Duda-hosted sites, 59% received at least one AI crawler visit in February 2026, adding up to 68.9 million visits. The sites that attracted more crawling were not using a hidden trick. They were easier to verify, easier to parse, and richer in usable information.

That matters because most of this crawling is no longer classic indexing. It is retrieval for live answers. In the dataset, 56.9% of crawler activity came from user fetch, 28.8% from training, and 14.3% from discovery. If you want better odds in AI search, the takeaway is clear: structured business data, stronger external signals, and deeper content make a site easier for AI systems to trust and reuse.

What did the dataset actually show?

The scale is the first big story. More than half of the analyzed sites were touched by at least one AI crawler in a single month, which means AI retrieval is no longer limited to a tiny set of publisher sites or major brands. It is already part of how the web is being accessed.

The second story is concentration. OpenAI accounted for 55.8 million visits, or 81.0% of all AI crawler activity in the dataset. Anthropic followed with 11.5 million visits, or 16.6%. Perplexity represented 1.3 million visits, while Google was at 380,000. For most brands, that means ChatGPT-related retrieval is still the center of gravity.

The third story is growth in referral traffic from AI systems. Total LLM referrals rose from 93,484 to 161,469 year over year, a 72.7% increase. ChatGPT referrals grew from 81,652 to 136,095, Claude jumped from 106 to 2,488, Copilot rose from 22 to 9,560, and Perplexity increased from 11,533 to 13,157. The growth rates differ, but the direction is consistent: AI-generated discovery is becoming a meaningful traffic source.

Why are AI systems revisiting some sites more often?

The clearest pattern is that AI systems revisit sites that already look trustworthy and usable. This does not prove causation, and that nuance matters. But the correlation is strong enough to show what these systems seem to prefer.

Sites that allowed AI crawling averaged 527.7 human sessions, compared with 164.9 for sites that were not crawled. They also averaged 4.17 form completions versus 1.57, and 8.62 click-to-call actions versus 3.46. Among sites with more than 10,000 sessions, the crawl rate reached 90.5%.

That pattern suggests AI systems are not rescuing weak sites from obscurity. They are more often revisiting sites that already show signs of demand, activity, and legitimacy. In practice, AI visibility seems to follow real-world usefulness more than it creates it from scratch.

Which signals correlated most with more crawling?

The research grouped the strongest correlations into three buckets: external integrations, structured business data, and content depth. Each one gives AI systems a different kind of confidence signal.

1. External integrations

External integrations are connections to third-party systems that help validate a business. They act like machine-readable trust signals because they make identity and reputation easier to cross-check.

  • Sites with a Yext integration had a 97.1% crawl rate, versus roughly 58% for sites without it.
  • Sites with review integrations reached an 89.8% crawl rate, versus 58.8% without, and averaged 376.9 crawler visits.

A simple example is a local business whose site is synced with review and listing platforms. That site gives AI systems more ways to confirm who the business is, what it offers, and whether the information is consistent.

2. Structured site features and business data

Structured business data is information formatted in ways machines can interpret reliably. It reduces guesswork. That matters when a model needs to ground an answer quickly.

  • Sites with Google Business Profile sync had a 92.8% crawl rate, versus 58.9% without, and averaged 415.6 crawler visits.
  • Sites using local schema had a 72.3% crawl rate, versus 55.2% without.
  • Sites with dynamic pages reached 69.4%, versus 58.2% without.
  • Ecommerce sites were slightly lower at 54.2%, versus 59.2% for sites without ecommerce.

That last point is a useful reminder that not every site feature helps equally. AI systems seem to value clarity and verifiability more than catalog size alone. A large product inventory is not the same thing as a site that is easy to interpret.

3. Content depth

Content depth is the amount of usable information a site offers across pages. In plain English, more relevant material gives AI systems more to retrieve, compare, and cite.

  • Sites with 50 or more blog posts averaged 1,373.7 crawler visits.
  • Sites with no blog averaged 41.6 crawler visits.

That is roughly a 33x difference. It does not mean every brand should publish content for the sake of volume. It does mean that when a site has a broader supply of useful, structured information, AI systems have more reasons to come back.

What does local schema completeness change?

Local business schema is structured data that tells machines key facts about a business, such as its name, phone number, address, hours, and social profiles. In this dataset, completeness mattered almost as much as presence.

  • Sites with no local schema fields had a 55.2% crawl rate.
  • Sites with 10 to 11 completed local schema fields had an 82% crawl rate.
  • That is a 26.8 percentage point lift.

This is one of the most actionable findings in the whole analysis. A half-finished schema setup sends an incomplete signal. A fully defined business profile gives AI systems a cleaner map of who you are and whether your information can be trusted.

For local SEO teams, this is especially important. If your business information is fragmented across the site, your schema, your profile listings, and your review sources, you make retrieval harder. If those signals line up cleanly, you lower the effort required for an AI system to use your site in an answer.

BotRank's Take

The most important insight here is not that crawlers are active. It is that crawl frequency and answer visibility are not the same thing. A site can be fetched often and still fail to earn mentions, citations, or favorable brand framing in AI answers. That is the gap many teams miss.

This is where BotRank's AI Visibility feature becomes useful. It lets teams run reusable prompts across major LLMs, track how often their brand appears, compare visibility against competitors, and inspect which sources and pages show up behind the answers. In the context of this dataset, that matters because crawling is only the input layer. The real business question is what those systems actually say once they have visited your site. If you only measure bot access, you are still blind to the outcome that matters most: whether your brand is visible, accurate, and competitive inside AI-generated responses.

What should SEO and brand teams do now?

The data points to a practical GEO playbook. If you want to improve your chances in AI search, focus less on hacks and more on making your site easier to verify, easier to understand, and more useful to retrieve.

  • Strengthen business identity signals. Sync business data where possible, keep contact details consistent, and make core company facts obvious.
  • Complete your local schema. Name, phone, address, hours, and social profiles should not be missing or contradictory.
  • Use structured features that reduce ambiguity. Dynamic pages, well-labeled sections, and machine-readable business data help systems interpret the site faster.
  • Publish enough useful content to be worth revisiting. More pages only help when they add usable information. Thin content will not create the same effect.
  • Build real audience demand. The strongest crawl patterns appeared on sites that already attracted human traffic. Brand demand and AI visibility appear to reinforce each other.

The bigger mindset shift is this: AI search favors sites that are operationally legible. If a model can identify your business, confirm your details, and pull a clean answer from your pages, you are easier to reuse. That is not just an SEO principle anymore. It is a GEO principle too.

FAQ

Does more AI crawling guarantee better AI search visibility?

No. More crawling suggests a site is easier for AI systems to access and revisit, but it does not guarantee mentions, citations, or positive brand positioning in generated answers.

What is the strongest signal in this dataset?

No single signal explains everything, but external integrations, Google Business Profile sync, complete local schema, and deeper content all showed strong correlations with higher crawl rates and more visits.

Should every brand publish more content now?

Only if that content adds real usable information. The data favors depth and utility, not empty volume.

Why does local schema matter so much?

Because it makes business facts easier for machines to verify. When core fields are complete and consistent, AI systems need less guesswork to trust and reuse the site.

What is the practical takeaway for GEO teams?

Measure outcomes, not just access. Make your site technically legible, keep business data consistent, and track whether those improvements actually increase brand visibility inside AI answers.

AI search visibility is becoming more measurable. The brands that win will not be the ones chasing novelty. They will be the ones building sites that AI systems can verify quickly, interpret confidently, and reuse without friction. If you want to know whether that is happening for your brand, BotRank is the next step.