ChatGPT Search is citing fewer domains. That changes AI visibility

Published:
May 15, 2026

ChatGPT Search is getting more selective, not more generous. New reverse-engineering research suggests that after the default switch to GPT-5.3 Instant on March 4, the average answer cited about 15 unique domains instead of 19, while unique URLs fell from 24 to 19. For brands, that means AI visibility just got tighter: the answer box still looks rich, but fewer sites are being chosen to fill it.

The deeper shift is technical. ChatGPT Search now appears to rely on a tool called web.run, split one prompt into multiple fan-out queries, and send the ChatGPT-User crawler to fetch the pages it actually wants to read. If your site is not part of that retrieval path, your content can be invisible even when it is technically crawlable.

What changed inside ChatGPT Search?

The big change is concentration. The research tracked 400 daily prompts over 14 weeks and found that once ChatGPT moved from GPT-4o/5.2 to GPT-5.3 Instant, the average number of cited domains dropped by more than 20%.

That matters because the visible citation area did not suddenly disappear. It just started being shared by fewer publishers. Think of it as the same shelf space with fewer products on it.

A simple example makes the shift clear. If an answer previously cited 19 domains and now cites 15, four publishers lose exposure even though the user still sees a fully formed response. For content teams, that is not a cosmetic update. It is a market-share change.

Why are fewer domains winning now?

Because ChatGPT Search seems to be narrowing its source selection rather than reducing crawl depth per site. The research says the URL-to-domain ratio stayed stable at 1.26, which suggests the system is not suddenly reading shallower within a chosen site. It is choosing fewer sites in the first place.

The study calls this the "Bigfoot Effect," borrowing the old search pattern where a small number of domains dominated the page. In ChatGPT Search, the same logic shows up differently: fewer websites get a seat at the table, while the winners can still earn multiple retrieved URLs.

There is also a model-behavior angle. GPT-5.4 Thinking reportedly uses site: operators to focus searches on trusted domains and often spreads retrieval across more than 10 fan-out queries for a single response. By contrast, GPT-5.3 Instant typically runs only two or three rounds.

The research also argues that product design plays a role. More than 90% of weekly users are on the free plan, and the default experience appears to trigger fewer web searches, fewer queries, and fewer citations. In practice, that means visibility can shrink not only because your page is weaker, but because the user is routed through a lighter retrieval path.

What does web.run actually do?

web.run appears to be the internal web orchestration layer behind ChatGPT Search. It is the mechanism that searches, opens pages, finds text, clicks into results, and pulls back source material for the model to summarize.

According to the research, this system changed format after GPT-5.3. Earlier versions used compact pipe-separated commands. Newer versions use structured JSON with typed parameters. That sounds like an implementation detail, but it signals something bigger: the model is no longer issuing simple search instructions. It is coordinating a more explicit retrieval workflow.

The documented operations now include actions such as search_query, open, find, click, screenshot, and product_query, along with specialized widgets for areas like sports, finance, and weather. In other words, ChatGPT Search is not one search call followed by one summary. It is a chain of small decisions.

A concrete example from the research shows how this works for shopping prompts. For a query like "best 3D printer to buy in 2026," ChatGPT first appears to run a rewrite fan-out to build a candidate set, then launches a separate shopping fan-out for each product to collect specs, reviews, and pricing. Before GPT-5.3, those product searches were bundled more tightly. Now each candidate can trigger its own retrieval path.

Which crawler actually reads your pages?

The key retrieval agent appears to be ChatGPT-User. In the study's honeypot experiment, that crawler fetched page content during conversational browsing, while OAI-SearchBot was described as the agent used to build ChatGPT's search index.

That distinction matters. Indexing and retrieval are not the same thing. A page might be known to the system at one layer, but still fail to become the page that gets opened and summarized inside a live answer.

For brands, this creates a practical audit question: when ChatGPT searches your topic, can it actually fetch the page you care about, extract the main message, and reflect it accurately? The research suggests you can test this directly by asking ChatGPT to run a site-specific search, open the returned URLs, and summarize what it found.

Step 1: search your domain for a target topicStep 2: open the top returned pagesStep 3: ask for the title, main topic, and key points from each page

If the output is garbled, incomplete, or misses your primary message, your problem is not just ranking. It is retrievability.

BotRank's Take

This is exactly why AI visibility cannot be treated like a one-time ranking check. If the same prompt can cite different sources across GPT-5.3 Instant, GPT-5.4 Thinking, and lighter free-plan experiences, then "am I visible in ChatGPT?" is the wrong question. The real question is: where, how often, and in which model conditions does my brand actually appear?

BotRank's AI Visibility feature is built for that operational reality. Teams can create reusable prompts, run them across multiple LLMs, track visibility over time, compare models, and inspect the entities, sentiment, keywords, and cited sources that show up in answers. That matters here because a brand can look strong in one model and weak in another without changing a single page on its site. The useful move is not guessing. It is measuring the spread, spotting the drop, and fixing the pages and narratives most likely to affect retrieval.

Why is model-by-model testing now mandatory?

Because "GPT-5" is not one stable citation environment. The research says GPT-5.2, 5.3, and 5.4 all share the same August 2025 knowledge cutoff and belong to the same model family, yet the same prompt can still produce different fan-out queries, source lists, and quoted passages.

That difference likely comes from post-training and inference behavior, not just from pretraining. The article points to factors like reward shaping, fine-tuning, system prompt configuration, and compute budget. GPT-5.4 Pro, for example, is described as getting more compute to think harder, which can change what it cites.

Here is the practical implication: a brand might appear in GPT-5.4 Thinking for a competitive category prompt, then disappear in GPT-5.3 Instant on the exact same question. If you only test one model once, you are not measuring visibility. You are measuring a single snapshot of one retrieval path.

What are parametric and dynamic visibility?

The research separates AI visibility into two layers: parametric visibility and dynamic visibility. That distinction is one of the most useful ideas in the whole analysis.

Parametric visibility is what the model already knows without searching the web. It is the brand memory built from training data, shaped by signals such as press coverage, Wikipedia presence, and mentions on high-authority sites.

Dynamic visibility is what the model retrieves live when search is enabled. It is more volatile, more model-dependent, and more exposed to sudden changes in tool behavior or source selection logic.

The link between the two is the uncomfortable part. The model appears to formulate web queries around sources and entities it already recognizes. So if your brand is weak in parametric memory, you may never become a likely candidate in dynamic retrieval. You can think of it like this: if the model does not already know your name, it may never bother looking for your page.

The research even frames knowledge cutoff updates as the "Google Dance" of LLMs. They can reshuffle parametric visibility in bulk, but they happen slowly because retraining is expensive. Dynamic visibility, by contrast, can shift overnight.

What should brands do next?

Brands need to treat ChatGPT Search as a monitored system, not a mystery. The winning play is not chasing one universal "AI ranking factor." It is building a repeatable observation loop.

  • Test prompts across models. A result in GPT-5.4 does not guarantee the same visibility in GPT-5.3 Instant or a lighter free-tier experience.
  • Audit retrievability, not just crawlability. Ask ChatGPT to search your site, open pages, and summarize them. See what it can actually extract.
  • Watch citation concentration. If fewer domains are winning, competitive benchmarking matters more because losses will be sharper.
  • Strengthen brand authority outside your site. Parametric visibility is shaped by the broader corpus, not just your own pages.
  • Track changes over time. A model update can wipe out visibility gains that looked stable a week earlier.

This approach works well for diagnosing live AI search performance, but it has limits. It will not give you a single permanent score, because the system itself is moving. That is exactly why continuous monitoring matters more than point-in-time reporting.

FAQ: what marketers should know about ChatGPT Search visibility

Does fewer citations mean ChatGPT Search is worse?

Not necessarily. It means the system appears to be concentrating trust and retrieval on a smaller set of sources. That can improve consistency for users while making visibility harder for everyone else.

What is a fan-out query in plain English?

A fan-out query is when one user prompt gets broken into several narrower searches. Instead of running one broad search, the model spreads retrieval across multiple targeted searches to gather and compare sources.

Is being crawlable by OAI-SearchBot enough?

No. The research suggests that ChatGPT-User is the crawler fetching page content during live conversational browsing. A page can be known to the system but still fail to become the page ChatGPT actually reads.

Why can the same prompt cite different sources across GPT-5 models?

Because retrieval behavior changes after pretraining. Different versions can use different fan-out strategies, compute budgets, and source-selection logic even when they share the same knowledge cutoff.

What should I track first if I care about AI visibility?

Start with a fixed set of business-critical prompts, test them across models, and record whether your brand appears, how it is described, and which sources are cited. From there, compare your visibility against competitors and identify the pages or narratives that need work.

ChatGPT Search is no longer just a black box you hope to be included in. It is a retrieval system with observable behavior, shrinking citation real estate, and clear model differences. If your brand depends on discovery in AI answers, now is the time to measure that visibility like a real channel. BotRank can help you do that with a lot more discipline than guesswork.