How AI answer engines seem to pick sources

The first mistake I made with GEO was thinking about it as “SEO, but for ChatGPT.”

That is not completely wrong, but it is too lazy. The overlap is real. Good content, clear structure, authority, citations, and topic coverage still matter. But the final selection process is different enough that the old mental model starts to leak.

In normal SEO, the user searches, sees a list of links, and chooses where to click. In AI search, the engine searches, reads, compresses, and answers. Google’s Search Central docs now talk about AI features as part of the search surface, and OpenAI’s ChatGPT Search docs describe answers with linked sources rather than a separate blue-link results page. Your page may influence the answer even if the user never visits it. It may also rank well in Google and still never appear in the generated answer.

A rough answer-engine path

01 User question
"Best LLM observability tool for a small AI team"
02 Query fan-out
The engine searches around pricing, docs, integrations, alternatives, and use cases.
03 Source selection
Owned pages, docs, review sites, community threads, and trusted category pages compete.
04 Synthesized answer
The buyer sees a shortlist. Sometimes there is a click. Often the impression happens inside the answer.

That second part is the one worth sitting with.

Search Engine Journal reported on Ahrefs data showing that the overlap between Google AI Overview citations and the top 10 organic results fell hard. In July 2025, 76 percent of cited pages were also in the organic top 10. In the 2026 analysis, that number was 38 percent. A large share of citations came from pages ranking outside the top 10, or even outside the top 100.

You can argue about the exact number. You should. This space changes too fast to worship any one metric. But the direction is clear enough: being a strong search result and being a useful answer source are related, not identical.

One reason is query fan-out.

When a user asks one question, the engine may turn it into several hidden searches. Google says AI Mode uses query fan-out, breaking a question into subtopics and issuing multiple searches. OpenAI says ChatGPT Search can also rewrite a prompt into one or more targeted queries before sending them to search providers. A simple prompt like “best vector database for a small AI startup” can become searches about pricing, deployment, Python SDKs, production scale, open source options, latency, benchmarks, integrations, and migration paths. The model then pulls from those results to assemble an answer.

This changes the content game. You are no longer only fighting over one keyword. You are trying to become a good source across the cluster of questions that sit around the buying decision.

That is why thin comparison pages feel weak here. A page that says “we are fast, scalable, secure” gives the engine almost nothing to use. A page that compares latency numbers, deployment tradeoffs, supported frameworks, pricing limits, migration steps, and known failure cases gives the engine more material.

Another reason is source bias.

A 2026 paper, “Answer Bubbles: Information Exposure in AI-Mediated Search”, looked at 11,000 queries and 33,000 generated answers across several systems. The researchers found that generative systems do not pull evenly from the web. They overuse some source types, such as Wikipedia and longer sources, and their preferences change by topic. Entertainment leans toward IMDB. Sports leans toward ESPN. Music leans toward Spotify and Genius.

That matters because it makes vertical GEO more believable than generic GEO. If each category has its own trusted nodes, making your page better is only part of the work. You also need to figure out which sources this category’s answers already trust, then earn a presence there.

The source graph to inspect

Owned

Homepage Docs Comparison pages Benchmarks

AI answer

Shortlist, explanation, citations, and sometimes no click.

External

Review sites GitHub Community threads Industry blogs

This also explains why one universal AI visibility score feels suspicious to me.

Different engines retrieve differently. ChatGPT, Perplexity, Gemini, Claude, Copilot, and Google do not expose one shared ranking system. A page can be visible in one engine and absent in another. Perplexity may cite the current web more aggressively. ChatGPT may answer from a mix of search and model memory depending on product mode. Google AI Overviews sit inside the search ecosystem but do not map cleanly to organic rankings.

If a tool gives you one tidy number, I want to know what it hides.

The practical takeaway is not that SEO is dead. I do not buy that. The takeaway is that answer visibility has a different set of failure modes.

The failures I would check first:

The brand is clear to humans, but the category is vague to machines.
The site has product pages, but not enough comparison or decision content.
The site has claims, but not enough numbers, sources, or specific examples.
The brand has documentation, but no third-party pages describe it well.
The brand ranks for head terms, but does not cover the follow-up questions created by query fan-out.
The brand appears in its own content, but competitors appear in trusted external sources.

Most teams will not notice this from normal analytics. If AI search sends tiny referral traffic, the channel looks irrelevant. But the real loss may be upstream: the buyer asked an answer engine for a shortlist, saw three competitors, and never searched your brand at all.

That is hard to measure. It is also plausible enough to take seriously.

I would not start with tricks. I would start with a small map.

Pick one category. Pick 20 to 30 prompts that a buyer would actually ask. Run them across the engines you care about. Record whether you were mentioned, who was recommended, which URLs were cited, which domains appear repeatedly, and what angle the answer used to describe the category.

After doing that a few times, the work becomes less mystical.

You start to see the same domains. You see the same comparison frames. You see where the model has old information. You see where your product is missing from the category vocabulary entirely. You also see where the engine is not wrong, which is useful and mildly painful.

That is the part I like about GEO when stripped of the hype. It forces a company to look at how the market describes it when the company is not in the room.

Sometimes the answer is “we need schema markup.”

More often, I suspect the answer is “we have not given the web enough clear, specific, externally reinforced material to describe us.”

That is less exciting than a hack. It is also more useful.

Owned

External

sources and further reading