Article
Measure GEO without lying to yourself
A practical measurement framework for AI answer visibility, with prompt panels, repeated sampling, mention scores, citation tracking, and honest reporting.
GEO measurement is a mess.
That does not mean it is useless. It means you should stop pretending one screenshot proves anything.
The hard part is that AI answers are probabilistic, product behavior changes, engines differ, and the prompts you test may not match what real buyers ask. The 2026 paper “Don’t Measure Once” is useful here because it treats AI search visibility as a stochastic measurement problem, not a screenshot contest. Many tools do not have real user prompt data. They approximate. Sometimes that is fine. Sometimes it creates a very expensive astrology dashboard.
I would measure GEO like a baseline and trend problem, not like a precise attribution channel.
separate visibility from traffic
Start with a basic distinction:
Traffic is what arrives on your site.
Visibility is whether the brand appears, gets recommended, or gets cited inside the answer.
AI referral traffic is currently small for most sites. SE Ranking’s 2026 study put it at 0.32 percent of total traffic across its dataset. That does not mean AI visibility is irrelevant. It may mean the important moment happened before the click.
If a buyer asks an answer engine for a shortlist and never sees your brand, there may be no referral to measure.
So I would report both:
- referral traffic from AI platforms
- answer visibility in high intent prompts
Do not collapse them into one number. They describe different things.
use a fixed prompt panel
A prompt panel is just a saved set of prompts you test repeatedly.
Category
Best {category} tools in 2026
Best {category} for {use case}
Comparison
{product} vs {competitor}
{product} alternatives
Decision
How do I choose a {category}?
{category} pricing comparison
Implementation
How to implement {use case} with {stack}?
Best {category} that integrates with {stack}
Reputation
Is {product} reliable?
{category} tools recommended on Reddit
For each category, I would maintain 20 to 50 prompts across:
- category selection
- direct comparison
- alternatives
- implementation
- pricing
- reputation
- migration
Each prompt should have:
- category
- buyer stage
- intent level
- target persona
- target region or market
- engines tested
- sampling count
The prompt panel should change slowly. If you rewrite it every month, you cannot tell whether visibility changed or the test changed.
You can add new prompts, but keep the old ones as a core baseline.
sample repeatedly
For a lightweight process, sample each prompt three times per engine. For important prompts, sample more.
Record:
- response date
- engine
- prompt
- sample number
- brand mentioned
- brand recommended
- rank or position
- cited URLs
- cited domains
- competitors mentioned
- answer accuracy
- notes
This is tedious. It is also where the truth lives.
If a brand appears in one out of three samples, that is different from appearing in three out of three. A single screenshot hides that difference.
score simply
I would start with a simple score:
- 0: absent
- 1: mentioned
- 2: recommended
- 3: recommended with a useful cited source or clear supporting reason
Then calculate:
- mention rate
- recommendation rate
- average score
- cited source count
- competitor share of recommendations
- accuracy issue count
Do this by prompt bucket and by engine.
The bucket view matters because not all prompts are equal. If you win brand-name prompts but lose “best X for Y” prompts, you are visible only to people who already know you. That is not the same as category visibility.
track source drift
AI answers can change sources over time. Some vendor reporting in the GEO market claims very high citation churn, sometimes up to 90 percent in certain contexts. I would not build a worldview around one vendor number, but source drift is clearly real enough to track. If the same prompt can return different citations across samples, the source map is part of the measurement, not a footnote.
So track domains along with brand mentions.
For each month:
- top cited domains
- new domains
- disappeared domains
- competitor-owned domains
- third-party domains
- your own domains
This tells you where the answer layer is getting its facts.
If a new comparison site starts appearing in several prompts and you are absent from it, that is an action item. If your docs are cited more often after a documentation rewrite, that is useful evidence, even if referral traffic barely moved.
report uncertainty in plain language
The reporting style matters. If the market is noisy, the report should admit it.
I would include a measurement note like this:
This report is based on a fixed prompt panel sampled three times per engine between {date} and {date}. AI answers vary by session, account, location, product mode, and retrieval behavior. Treat the numbers as a visibility baseline and trend indicator, not exact market share.
This may sound less impressive than a confident dashboard. Good. I would trust it more.
connect metrics to actions
A GEO report that ends at “your score is 42” is not useful.
Every metric should point to one of a few action types:
- publish or improve owned content
- update docs
- create a comparison page
- add facts, citations, or benchmarks
- fix outdated public information
- earn third-party mentions
- respond to reputation gaps
- adjust positioning
Example:
Finding: In "best LLM observability tools" prompts, the brand was recommended in 1 of 15 samples. Competitor A was recommended in 11 of 15. The most cited source was {domain}, where Competitor A appears and the brand does not.
Action: Try to earn inclusion on {domain}, then publish a stronger owned comparison page covering tracing, evals, cost tracking, integrations, and deployment options.
That is much better than a chart.
do not fake attribution
The temptation will be to turn AI visibility into revenue attribution too early.
Be careful.
Some AI traffic shows up as referral traffic. Some does not. Some influence happens through brand search later. Some happens inside a sales call when a buyer says, “I saw you recommended somewhere.” Some is impossible to separate from normal SEO and content work.
I would use softer attribution until there is enough data:
- AI referral sessions
- assisted conversions from known AI referrers
- branded search lift after visibility campaigns
- sales call mentions
- self-reported attribution
- high intent prompt visibility over time
- competitor displacement in recommendation prompts
None of these is perfect. Together, they are better than pretending the channel is as clean as paid search.
the monthly report I would use
Summary
Mention rate, recommendation rate, average score, main competitor gap.
Source map
Top cited domains, new sources, disappeared sources, and missing inclusion targets.
Actions
Owned content, third-party source work, docs cleanup, and measurement changes.
# AI answer visibility report
Period: {month}
Category: {category}
Prompt panel: {count} prompts
Engines: {engines}
Samples: {samples}
## summary
Mention rate: {x}
Recommendation rate: {y}
Average score: {z}
Main competitor gap: {competitor}
Largest movement: {movement}
## wins
- {specific prompt bucket improved}
- {new source started citing us}
- {old inaccurate description disappeared}
## losses
- {competitor gained in category prompts}
- {important source stopped appearing}
- {engine has outdated information}
## source map
Top cited domains:
1. {domain}
2. {domain}
3. {domain}
Missing source opportunities:
1. {domain}
2. {domain}
## actions for next month
1. {owned content action}
2. {third-party action}
3. {measurement action}
## measurement note
This is a sampled baseline, not exact attribution.
If that report feels boring, it is probably on the right track.
Measurement in this market should be boring. The work is already uncertain enough. The report does not need extra drama.