You've Run the HubSpot AEO Grader. Here's What Your Score Doesn't Tell You.
The HubSpot AEO Grader is a legitimate tool. It gives you five real scores across three real AI platforms, costs nothing, and takes two minutes. The problem isn't what it measures. It's what it can't see, and that gap is exactly where most brands' citation problems actually live.
We ran Cited By AI® through the grader ourselves. ChatGPT gave us a 36 overall, with a Share of Voice score of 0/10. Perplexity scored us 49. Gemini, 50. Those numbers are honest; we're a young brand in a fast-moving category. But here's what the grader couldn't tell us: which paragraph on our site is causing the ChatGPT score to be 14 points lower than Perplexity, and what specifically to change about it.
That's not a criticism of HubSpot. It's a description of what the tool is built to do. Understanding the distinction is what tells you what to do next.
What the HubSpot AEO Grader actually measures
The grader evaluates your brand across five scored dimensions, cross-validated across ChatGPT (OpenAI), Perplexity, and Gemini. It doesn't query Claude or Microsoft Copilot. The five dimensions are:
-
Sentiment AnalysisTone of AI descriptions across three layers: overall sentiment, contextual sentiment by topic, and source-based sentiment. Scored out of 40, the highest-weighted dimension.
-
Brand RecognitionHow frequently and prominently your brand appears in AI-generated responses. Scored out of 20.
-
Presence QualityData richness: the variety, depth, and consistency of information available about your brand across AI training data. Scored out of 20.
-
Share of VoiceYour estimated percentage of category-level AI mentions relative to competitors. Scored out of 10.
-
Market PositionCompetitive classification: Leader, Challenger, or Niche Player, plus an innovation archetype (Innovator, Disruptor, Traditionalist). Scored out of 10.
These are all real signals. A low Sentiment score means AI is describing your brand in neutral or negative terms. A low Presence Quality score means AI models have thin or inconsistent data about you. A Share of Voice of 0/10 (which is what we got from ChatGPT) means you're not being mentioned in category conversations at all on that platform.
All of that is worth knowing. The question is what you do with it.
Our actual scores, and what the grader said
Running Cited By AI® through the grader on 7 April 2026 produced this:
| Dimension | ChatGPT | Perplexity | Gemini |
|---|---|---|---|
| Overall Score | 36/100 | 49/100 | 50/100 |
| Brand Recognition | 2/20 | 3/20 | 3/20 |
| Market Score | 6/10 | 6/10 | 6/10 |
| Presence Quality | 5/20 | 9/20 | 9/20 |
| Brand Sentiment | 23/40 | 29/40 | 30/40 |
| Share of Voice | 0/10 | 2/10 | 2/10 |
The grader's verdict across all three platforms: "You're on the right track, but there's room to further optimise for AEO." That's accurate. It's also where the diagnosis stops.
Our ChatGPT score is 13 points lower than our Perplexity score. Why? The grader can't tell us. Our Presence Quality on ChatGPT is 5/20 versus 9/20 on the other two. What specifically is causing that gap? Not in scope. Our Share of Voice on ChatGPT is zero. Which pages need rewriting to change that, and how? The report doesn't go there.
The specific gap: brand level vs block level
The HubSpot AEO Grader operates at the brand level. It asks AI platforms how they characterise your brand overall. That's a legitimate question with a real answer. But AI citation decisions aren't made at the brand level. They're made at the block level.
When someone asks ChatGPT "what's the best tool for AI search visibility in the UK?", the model doesn't evaluate your brand's overall reputation and decide whether to cite you. It runs a retrieval process, pulling the specific 134 to 167 word content chunks from its training data that best answer the query, then cites the sources those chunks came from. That retrieval decision happens at the paragraph level, not the brand level.
A brand-level score tells you AI isn't citing you much. A block-level score tells you which paragraph AI skipped last Tuesday and exactly why.
A paragraph that opens with brand narrative ("Cited By AI® is a UK-based consultancy that...") will be skipped by retrieval systems in favour of a paragraph that opens with a direct declarative answer ("AI Search Engine Optimisation is the practice of..."). Both paragraphs might be on the same domain with the same brand recognition score. The grader treats them identically. The retrieval system doesn't.
What the grader can't show you
- Five brand-level scores per platform
- Composite score out of 100
- Sentiment classification
- Competitive position label
- Written interpretation (behind email gate)
- ChatGPT, Perplexity, Gemini coverage
- Which paragraph AI retrieval skips
- Which citability pillar is failing
- Claude coverage (Anthropic)
- Microsoft Copilot coverage
- Specific rewrite instructions
- Funnel-stage Share of Voice breakdown
The Claude gap is worth flagging separately. Claude is Anthropic's AI assistant. It runs on a different training data pipeline to ChatGPT and Perplexity, cites different sources, and weights content differently. HubSpot's grader doesn't cover it. If Claude is where your buyers research your category, and for B2B professional services it increasingly is. That's a coverage gap in your visibility picture.
The five pillars the grader doesn't measure
The Citation Probability Score® (CPS®) framework measures each content block across five pillars that directly explain why retrieval systems cite or skip a given paragraph. None of these are measured at the brand level; each one applies to a specific 134 to 167 word chunk of text:
-
Content Structure Is the block in the optimal RAG chunk range? Does it open with a declarative answer or a brand narrative setup?
-
Fact Density How many named entities, statistics, and verifiable claims appear per 100 words? AI retrieval models weight fact-rich passages significantly higher than descriptive text.
-
Answer Structure Does the block open with the declarative pattern retrieval systems favour: "[Topic] is/provides/enables [specific outcome]"?
-
Self-Containment Does the block make complete sense in isolation, without the paragraph above or the section heading? Retrieval systems don't have that context.
-
Freshness Signals Does the block contain date markers and recency language? Critical for Perplexity and Bing-powered AI search, which weight recency heavily.
When our Presence Quality on ChatGPT scores 5/20 against 9/20 on Perplexity and Gemini, one explanation is that our content blocks have stronger freshness signals (which Perplexity and Gemini weight more heavily) but weaker self-containment, which affects ChatGPT retrieval more. The grader sees the symptom. CPS® finds the cause.
The right sequence
The HubSpot AEO Grader is the right starting point for understanding your brand's AI presence. Run it. Take the scores seriously. A 0/10 Share of Voice on ChatGPT is a real signal that demands attention.
But it's a starting point, not a fix. The grader tells you the room is cold. Finding which radiator valve is stuck, and on which paragraph, is what CPS® block scoring does.
The sequence that works: HubSpot grader → understand your brand-level position → CPS® Block Scorer → identify the specific paragraphs causing it → rewrite those blocks → rerun the grader to measure the delta.
That sequence converts a benchmark into a result. The grader alone doesn't close that loop. Neither does a full CPS® audit alone if you haven't established the baseline. Both tools are doing different jobs at different levels of the same problem.
Start with one paragraph
You don't need to audit your entire site today. Pick the page on your site that matters most for the query you want to own: the page your buyers would land on when they ask ChatGPT about your category. Paste the first paragraph into the CPS® Block Scorer. Get the score. Read the pillar breakdown. That single data point will tell you more about why your HubSpot Share of Voice is low than anything in the grader's report.
CPS® Block Scorer — free, instant, no login
Paste any paragraph. Get a 0–100 Citation Probability Score® with a five-pillar breakdown showing exactly which element is causing the problem, and what to fix.
Score Your First Paragraph → Free tool · No signup · Results in secondsWant the full picture — including Claude and Copilot?
Free audit. 28 modules. All 5 platforms. Block-level CPS® scoring. Results in 48 hours.
Get Your Free Audit →