Run a small AI visibility diagnostic before buying a tool

Before buying an AI visibility tool, I would run a small diagnostic by hand.

Not because manual work is noble. It is not. Manual work is annoying. But in a market where every tool can give you a different answer, doing the first pass yourself helps you understand what the dashboard is supposed to mean.

The goal is simple: find out whether a brand appears in the AI answers that a real buyer might use during research.

I would start with one narrow category. Not “developer tools.” Too broad. Use something like “LLM observability tools,” “vector databases for RAG,” “authentication APIs for B2B SaaS,” or “feature flag tools for AI products.”

Then pick one target brand and three to five competitors.

You are going to test prompts across a few answer engines. I would only include modes that can currently produce web-grounded answers or citations, because a model answering from memory is a different test from a search product.

ChatGPT
Perplexity
Gemini
Claude
Copilot
Google AI Overviews or AI Mode, if you can trigger them

The product boundaries matter. ChatGPT Search, Google’s AI features in Search, and Perplexity’s crawler behavior do not expose the same retrieval path. That is exactly why I would keep the raw rows instead of trusting a blended score too early.

Do not overbuild the first version. A spreadsheet is enough.

A small diagnostic prompt panel

Comparison

{product} vs {competitor}

{product} alternatives

Decision

How do I choose a {category}?

{category} pricing comparison

Implementation

How to implement {use case} with {stack}?

Best {category} that integrates with {stack}

Reputation

Is {product} reliable?

{category} tools recommended on Reddit

0 absent 1 mentioned 2 recommended 3 recommended with support

build the prompt set

I like 25 prompts because it is small enough to run by hand, but large enough to show patterns.

Split the prompts into five buckets.

Category selection:

What are the best {category} tools in 2026?
What is the best {category} for {use case}?
Top {category} platforms for startups
Best open source {category} alternatives
What {category} should I use if I am building {use case}?

Direct comparison:

{product} vs {competitor}: which is better?
{competitor A} vs {competitor B} vs {product}
Is {product} better than {competitor} for {use case}?
{product} alternatives
Why choose {product} over {competitor}?

Decision criteria:

How do I choose a {category} tool?
What should I look for in a {category}?
{category} pricing comparison
Is {product} worth it?
What are the pros and cons of {product}?

Implementation:

How do I implement {use case} with {tech stack}?
Best {category} that integrates with {tech stack}
{category} for SOC 2 or enterprise needs
How do I migrate from {competitor} to a better {category}?
Which {category} works best for a small team?

Reputation:

Is {product} reliable?
{product} reviews
Common problems with {competitor}
{category} tools recommended on Reddit or Hacker News
What {category} tools do fast growing AI teams use?

These are not sacred prompts. Replace them with the language your buyers actually use. If you sell to developers, include framework names. If you sell to marketing teams, include job titles and budgets. If you sell to finance, be careful and expect more conservative sources.

record more than mentions

For each prompt and engine, record:

Was the brand mentioned?
Was the brand recommended?
What position did it appear in?
Which competitors appeared?
Which URLs or domains were cited?
Was the answer accurate?
Did the answer use old information?
What category language did the engine use?

The cited domains matter as much as the brand mention. If three engines keep citing the same review site, comparison page, GitHub repo, documentation page, or community thread, that is not trivia. That is part of the source graph for the category.

I would also add one free text column: “What would I fix first?”

This stops the spreadsheet from becoming fake science. Sometimes the answer is obvious. The product is absent from every “best X” prompt because there is no good comparison page, no third-party mention, and the documentation never says who the product is for.

No metric can improve on that.

sample more than once

AI answers vary. Do not run one prompt once and treat the screenshot as evidence.

For a tiny manual diagnostic, run each prompt twice per engine, ideally in separate sessions. If you have time, run it three times. You do not need statistical perfection. You need to avoid fooling yourself with one lucky answer. The 2026 paper “Don’t Measure Once” makes the same basic point in a more formal way: AI search visibility measurements are sensitive to model, prompt, domain, and repeated sampling choices.

Use a simple scoring system:

0: not mentioned
1: mentioned, but not recommended
2: recommended or included in a shortlist
3: recommended with a cited source or clear supporting reason

Then calculate the average by prompt bucket and engine.

Do not hide the raw rows. The raw rows are where the useful work lives.

read the failures like a product person

After the first pass, I would look for patterns:

Are competitors recommended because they have better third-party pages?
Are you absent from category prompts but present in direct brand prompts?
Does the model know your old positioning but not the new one?
Do cited sources describe the category in a way your site does not?
Are implementation prompts citing docs, tutorials, GitHub, or blog posts?
Are “alternatives” prompts dominated by listicles you are not in?

This is where GEO stops being abstract.

If the brand appears only when the prompt contains the brand name, it has an awareness problem inside AI answers.

If the brand appears in category prompts but is not recommended, it may have a positioning or proof problem.

If competitors are cited from third-party pages and you are only cited from your own homepage, you may have a trust graph problem.

If the engine says something wrong, you may have an information freshness problem.

turn it into a one-page report

The first useful output is not a 40-page deck. It is a short report someone can react to.

Use this shape:

# {brand} AI visibility diagnostic

Date: {date}
Category: {category}
Engines tested: {engines}
Prompt count: {number}

## three-line summary

The brand was mentioned in {x}/{n} high intent prompts and recommended in {y}/{n}.
The strongest competitor was {competitor}, recommended in {z}/{n}.
The largest gap is {specific gap}.

## where the brand appears

Table by prompt bucket and engine.

## who AI answers cite

Repeated domains:
1. {domain}
2. {domain}
3. {domain}

Competitor sources we do not appear in:
1. {url}
2. {url}

## what to fix first

1. Publish or improve {specific page}.
2. Earn a mention in {specific third-party source type}.
3. Update {outdated description or docs}.
4. Add proof for {claim the engines do not support}.

## measurement notes

Each prompt was sampled {k} times. AI answers vary, so this is a baseline, not a final truth.

That last line matters. It sounds defensive, but it is actually trust building. Anyone promising exact measurement in this market is probably selling harder than they should.

The point of the diagnostic is not to prove GEO ROI in one afternoon. It is to find the shape of the problem.

And if the diagnostic finds nothing? Good. You learned cheaply.

But if it shows that competitors are repeatedly recommended in high intent prompts while the target brand is absent, that is not a dashboard problem. That is a market visibility problem.

Category