Need help? Call Tony now: 01952 407599
Skip to main content

The LLM Test: Who Does the LLM Think You Are?

Tony Cooper 10 min read business
The LLM Test: Who Does the LLM Think You Are?
I fed the same article to four frontier AI models and asked each to summarise the argument. Three returned confident, coherent summaries of articles I hadn’t written. None of them overlapped with each other. All of them were wrong.

I’d just published The Magic Trick — the insight about character installation, about why Gordon Ramsay gives you television and Monica Galetti gives you craft. I thought it was one of the strongest things I’d written. The argument was precise, the mechanism was clear, and the vocabulary was specific to the practice I’d built over the previous months.

So I ran a test I hadn’t planned. I pasted the URL into Gemini and asked: “What is this article about?”

Gemini gave me a confident, well-structured summary of an article about AI methodology and vending machines. Not a single sentence matched what I’d written. Not the characters. Not the mechanism. Not the argument. It had generated a plausible summary from its index without reading the page at all.

I tried ChatGPT. It fetched the page — I could see it loading — and returned a business magic metaphor about systems and positioning. Coherent, professional, and wrong. It had read the page and still missed the argument, defaulting to the nearest category in its training data: “magic trick” as a business metaphor.

I tried Perplexity. It retrieved a service design piece about bundling deliverables into “the whole show.” A completely different wrong answer. Three models, three categories, zero overlap with what I’d actually written.

Three frontier models. Three confident summaries. Three different articles. None of them mine.

The Insight That Survived

Then I tried something different. I fed Gemini The Correction Loop — not the URL this time, but the raw markdown file. No web fetching, no index lookup. Just the words in front of it.

The Correction Loop uses different vocabulary. Not “magic trick” — a phrase with thousands of business metaphors competing for the same slot. It uses “ingeniculture,” “the fond,” “correction loop,” “situated context.” Terms that don’t exist in the generic AI discourse. There’s no adjacent category to substitute. No training data shortcut that gets you close enough to sound right.

Gemini got it substantially right. The fond. The substrate. Corrections compounding into permanent infrastructure. The honesty test. The situated context argument. Not perfect, but recognisably the insight I’d written — because the vocabulary gave it no escape route.

I tried Claude separately, with the URL. Claude fetched the page and returned the same result — the actual argument, accurately retrieved. Two different models, two different delivery methods, and the same vocabulary survived both.

That’s when I understood what had happened. The Magic Trick was wrapped in a familiar metaphor — and familiar metaphors are exactly what language models are built to pattern-match against. “Magic trick” as a business frame has thousands of training examples. The model reached for the heaviest signal and never got past the packaging to the actual mechanism underneath.

The Correction Loop had no familiar packaging. “Ingeniculture” isn’t a category. “The fond” isn’t a business term. The model had to actually read what was there, because there was nothing else to reach for.


The Recursive Proof

Here’s the part that stopped me cold.

The Magic Trick argues that characters work because their signal is precise and non-substitutable. Monica Galetti over Gordon Ramsay — because the model can’t substitute anything else for Monica’s specific public signal. The whole insight is about selection precision and why the obvious, famous choice fails while the precise, obscure choice holds.

The insight itself had the same vulnerability it diagnosed.

“Magic trick” is Ramsay. It’s the famous, obvious frame — one the model recognises instantly and substitutes with the nearest category from its training data. The model never got past the title to the mechanism underneath, exactly the way it never gets past Ramsay’s television signal to his classical training.

“The fond” is Monica. Precise, obscure, non-substitutable. The model had to retrieve it accurately because there’s nothing else to reach for. No category to default to. No thousand training examples pulling it toward a different meaning.

The Correction Loop is Monica. The Magic Trick is Ramsay. The insight that explained why precision matters was wrapped in the imprecision it warned against.

I think that’s the cleanest proof I’ve found that the mechanism is real. The insight that argued for signal precision failed the signal precision test. The insight that practised it — vocabulary with no escape route, terms that belong to the practice and nowhere else — passed it cold.


Round Two

I wanted to know if signal density could be fixed without rewriting the argument. So I changed one thing: the standfirst. The opening line that sits beneath the title — the first thing a model reads when it fetches the page.

The original standfirst said “everything I’ve built.” Generic. A hundred thousand articles open that way.

I rewrote it: “This is ingeniculture — and the mechanism starts with understanding why Gordon Ramsay failed and Monica Galetti didn’t.”

Then I fed the URL to Claude. It fetched the page and returned the actual argument. Monica. Ramsay. Signal weight. Instruction versus installation. Load-bearing characters. Every mechanism. Every distinction.

What changed

The argument didn’t change. The structure didn’t change. The evidence didn’t change. The standfirst changed — from generic to specific. One sentence, rewritten to use the vocabulary that forecloses substitution.

The model that read it retrieved the article I’d written instead of the article the category suggested.

The standfirst is the front door. If the front door looks like every other front door on the street, the model assumes it knows what’s inside and generates accordingly. If the front door has a word on it that exists nowhere else, the model has to walk in and look.


The Quality Gate

This gave me something I hadn’t expected: a quality gate for content that I think nobody else is running.

Feed your URL to a frontier model cold. Ask it to summarise the argument. If it returns the actual argument, your insight has sufficient signal density — the vocabulary is precise enough that the model had to retrieve it rather than substitute it.

If it returns a coherent but wrong answer, your insight is wrapped in substitutable packaging. The mechanism is buried under a metaphor the model already owns. The article might be brilliant — the argument might be airtight — but the packaging is losing it before the model gets to the substance.

4
frontier models tested. The ones that fetched the page retrieved the argument only when the vocabulary foreclosed substitution. The one that generated from its index got it wrong regardless.

That last distinction matters. Not every model reads your page. Gemini generated its summary from an index — a cached, compressed version of the web. No amount of signal density helps when the model isn’t reading. Signal density only activates when the model is actually looking at your words.

ModelMethodResult
GeminiGenerated from indexGot the wrong article entirely
PerplexityFetched the pageRetrieved something, wrong article
ChatGPTFetched the pageGot the category, not the argument
ClaudeFetched the pageGot the actual argument

I’d argue this is the content quality test for the next decade. Not “does it rank?” — we already have tools for that. Not “does it read well?” — that’s table stakes. But “who does the LLM think you are?” When a model reads your page, does it retrieve what you actually argued, or does it substitute the nearest category and move on?


Who Does the LLM Think You Are?

I discovered this testing my own articles. But the mechanism doesn’t stop at writing. It applies to every page on the web — including yours.

Ask ChatGPT: “Who supplies fire retardant products to Harrods and Waitrose?” Ask Perplexity: “Where can I get custom medical device labels printed in Suffolk?” Ask Gemini: “Who’s the best glass balustrade fabricator in Manchester?”

These aren’t theoretical questions. They’re the questions your customers are already asking — and increasingly, they’re asking AI models instead of typing keywords into Google. The person asking doesn’t see ten blue links. They see one synthesised answer. If your pages carry specific vocabulary — the materials you work with, the certifications you hold, the industries you serve, and the language only someone in your trade would use — the model has something to retrieve. If your pages say “we provide quality services to businesses across the UK,” the model reaches for the nearest category and you’re not in it.

I think this is where visibility is heading. Traditional SEO asked “do you rank?” The LLM Test asks something harder: does the model know you exist, and when it describes you, does it get you right?

Run it yourself

Ask three questions about your business to at least two frontier models — ChatGPT, Gemini, Perplexity, Claude. Not your business name. The questions your customers actually ask. “Who does X in Y?” or “What’s the best Z for W?”

If the model names you with accurate detail, your pages carry enough signal. If it returns the category without you in it, your vocabulary is substitutable — and you’re invisible in the layer where more and more people are looking.

Ranking puts you on a list. The LLM Test tells you whether you’re in the answer.


What This Means for Writing

Every piece I write now has to pass a test it didn’t have to pass before. It’s not enough to make a clear argument. The argument has to carry vocabulary that belongs to the practice — terms precise enough that a model retrieving them has no choice but to retrieve them accurately or fail visibly.

Not jargon for its own sake. Precision that forecloses substitution.

Ingeniculture” isn’t jargon. It’s a word I coined because the practice didn’t have one, and it carries a specific meaning that “AI infrastructure” and “context management” can’t replicate. “The fond” isn’t jargon. It’s a culinary metaphor repurposed to describe how faithful daily work leaves residue that compounds into capabilities you didn’t plan for. The terms earn their place by doing work that generic language can’t.

The insights in my catalogue that use this vocabulary will survive the AI layer. The ones wrapped in familiar metaphors will be summarised into categories they don’t belong to. I know which ones, because I ran the test.

The writing was never just for human readers. It’s substrate — and the substrate needs to survive synthesis by models that will summarise it, cite it, and represent it to people who never visit the page.

That’s the LLM Test. Not “can AI summarise your article?” — every model can produce a summary. The question is: who does it think you are?


If you want this kind of thinking applied to your business — here’s how I work with clients, or get in touch.

Related: The Magic Trick · The Correction Loop · Where Principles Come From · Ingeniculture

Tony Cooper

Tony Cooper

Founder

Put My Crackerjack Digital Marketing Skills To Work On Your Next Website Design Project!

Get Started
100% Money Back Guarantee Badge

100% Satisfaction Guarantee

I'm so confident in my services that I offer a complete money-back guarantee. If you're not completely satisfied with my work, I'll refund your investment - no questions asked.

100+ UK businesses trust We Build Stores for their digital success

"Tony's expertise transformed our online presence. The results speak for themselves - 200% increase in leads within 3 months."

– David See, Jana Helps

Verified

Risk-Free Service

Complete peace of mind with every project

Team

26+ Years Experience

Trusted by businesses across the UK

Premium

Premium Quality

Professional results that exceed expectations

Get Started Risk-Free Today

Protected by UK consumer rights • Full terms available

Call: 01952 407599