How to Test if AI Systems Are Citing Your Website

Start with the simplest possible test

You don’t need tools. You don’t need analytics. You don’t need a tracking dashboard. The simplest test for whether AI systems cite your website is to ask them and see what happens.

Open ChatGPT, Claude, or Perplexity. Ask a question that your business should be the answer to. Read what comes back. Look for your name, your business, your website, or a clear paraphrase of something you’ve written. That’s the test.

It’s almost insultingly simple. It’s also more informative than most of the GEO measurement content currently being sold. Manual prompting is the most reliable, lowest-cost way to know whether your work is having an effect — and it’s available to anyone who has five spare minutes.

The rest of this lesson is about doing this test in a way that produces useful, comparable results over time.

The three kinds of prompts to test

There are three distinct prompt types worth running, and they tell you different things.

Branded prompts. Ask the AI directly about you or your business. “Tell me about Warren Groom.” “What is warrengroom.com?” “Who is the freelance WordPress developer Warren Groom in Toronto?”

These prompts test whether the AI can describe you accurately when it knows who you are. The answers reveal what AI systems “know” about you — what your entity looks like in their internal representation. If the answers are accurate, your entity is clear. If they’re vague, generic, or contain errors, your entity needs more work.

Category prompts. Ask the AI a question your business should be the answer to, without naming yourself. “Who’s a good freelance WordPress developer in Toronto?” “What’s a good white-label WordPress partner for agencies?” “Who teaches GEO for website owners?”

These prompts are the real test. They show whether AI systems associate you with the topics, places, and audiences you want to be associated with — without prompting them to look for you specifically. If you appear in the answer, your knowledge-graph relationships are doing their job. If you don’t, the connections covered in Module 3 aren’t yet strong enough to surface you.

Comparison prompts. Ask the AI to compare your business to others, or to recommend among options. “Compare Warren Groom to other Toronto-based WordPress developers.” “Who are the leading freelance WordPress developers for agencies?”

These prompts test how you’re positioned relative to competitors. They reveal what AI systems think distinguishes you, what they think your strengths are, and where the gap to other named businesses sits. Useful both for self-assessment and for spotting positioning opportunities.

Running all three types gives you a much clearer picture than running just one.

Running the test across multiple AI systems

Different AI systems will give different answers. That’s not a bug — it’s part of the information.

A useful baseline test runs the same prompt across at least three systems: ChatGPT, Claude, and Perplexity. These three together represent most of the AI-driven traffic and citation activity for a typical business. Adding Google’s AI Overviews (visible at the top of many Google search results) gives a fourth data point that captures search-integrated AI behaviour.

If you appear in some but not others, that’s useful information. It usually means your entity is recognised by some systems and not yet by others — which is the normal state of affairs in a field this new. Comparing across systems also helps you spot inaccuracies. If one AI describes you correctly and another gets the basics wrong, you’ve found a specific gap that’s worth investigating.

There’s no need to test every AI tool that exists. The three or four mentioned here cover the vast majority of citation activity. Smaller tools can be added if they matter specifically to your audience.

How often to test

A reasonable rhythm: a baseline test now, then quarterly check-ins, plus one-off tests when you’ve made a meaningful change to your site.

The baseline test is the important one. Spend an hour running the three prompt types across three or four AI systems, and write down what you find. This becomes the reference point everything else compares against.

The quarterly check-ins are lighter. Run the same prompts again, compare to the baseline, note what’s changed. You’re looking for trends — entities becoming clearer over time, new topics being correctly associated, accuracy improving — rather than for individual events.

The one-off tests happen when you’ve changed something specific. Rewritten your About page? Re-run the branded prompt and see if the description has improved. Added Person schema? Look for whether more accurate biographical detail appears. The one-off tests give you a direct line between the work you’ve done and the effect it’s having.

You don’t need a spreadsheet or a formal tracking system, though either helps if you’re testing across many businesses. A short document with the prompts, the dates, and what came back is enough for most people.

How to interpret the results

A few principles that help avoid common interpretation errors.

Volume isn’t the goal. You’re not trying to be mentioned the most. You’re trying to be mentioned accurately, by the right systems, in response to the right kinds of questions. A single accurate citation in response to a high-intent question is worth dozens of generic mentions.

Inaccuracy is a signal, not a failure. If an AI describes your business incorrectly, you’ve found a specific gap in how clearly your site communicates the truth. Fix the source — usually your About page, your Person schema, or your service pages — and the description tends to improve over time as the AI systems update their understanding.

Absence is a signal too. If you don’t appear in a category prompt where you’d expect to, that’s data. It usually means the relationships covered in Module 3 aren’t strong enough yet. Decide whether the topic is one you actually want to be known for, and if so, work on the connections.

Don’t read too much into a single test. AI systems are probabilistic. The same prompt run twice can return slightly different answers. Look for patterns across multiple tests rather than treating one result as definitive.

Don’t read too little either. A pattern that holds across multiple AI systems, over multiple tests, with multiple phrasings of the same prompt — that’s real. The signal is in the consistency, not in any single result.

The honest framing: manual testing is approximate. It’s also more reliable than the formal tools currently exist for, and far cheaper. Most businesses don’t need anything more sophisticated than this for at least their first year of GEO work.

A useful mindset

The fastest way to know whether AI systems can find you is to ask them. The fastest way to know whether they describe you accurately is to read what they say. Both take minutes and cost nothing.

If you do this regularly, you’ll know more about your GEO performance than most businesses spending money on measurement tools. The simplicity is the strength.

Coming up in the next lesson: Tools that are starting to appear. The GEO tooling landscape is new and evolving fast. We’ll look honestly at what’s available, what’s useful, and what to wait on — without pretending the field is more mature than it is.