What does the HubSpot AEO Grader measure?

The HubSpot AEO Grader measures brand perception across five dimensions: Sentiment Analysis (tone of AI descriptions, scored out of 40), Brand Recognition (how frequently the brand appears in AI responses, scored out of 20), Share of Voice (estimated category presence percentage, scored out of 10), Presence Quality (data richness and coverage depth, scored out of 20), and Market Position (competitive classification: Leader, Challenger, or Niche Player, scored out of 10). Results are cross-validated across ChatGPT (OpenAI), Perplexity, and Gemini, producing a composite score out of 100 per platform. The tool does not cover Claude or Microsoft Copilot.

What doesn't the HubSpot AEO Grader tell you?

The HubSpot AEO Grader does not provide block-level content diagnosis. It can't show you which specific paragraph on your site AI retrieval systems skip, which of the five citability pillars that paragraph is failing, or what the rewrite should be. It also doesn't cover Claude (Anthropic) or Microsoft Copilot, two platforms that represent a significant share of AI search traffic. The grader tells you your brand has a low Share of Voice score; it can't tell you whether that's because your content opens with brand narrative instead of declarative answers, or because your paragraphs reference surrounding context that AI retrieval can't see.

What is block-level AI citation scoring?

Block-level AI citation scoring measures how likely each individual content block (typically 134 to 167 words, the RAG chunk size AI retrieval systems embed and evaluate) is to be retrieved and cited. The Citation Probability Score® (CPS®) framework scores each block across five pillars: Content Structure, Fact Density, Answer Structure, Self-Containment, and Freshness Signals. A brand-level score like the HubSpot grader produces tells you how AI perceives your brand overall. A block-level score tells you which specific paragraph is the reason, and what to change.

How do I improve my HubSpot AEO Grader score?

Improving your HubSpot AEO Grader score requires identifying the specific content blocks on your site that AI retrieval systems are skipping and why. Brand-level scores alone don't provide that root cause. The most direct route is block-level diagnosis using the CPS® framework: paste any paragraph into the free CPS® Block Scorer at citedbyai.info/cps-scorer to get a 0–100 score with a five-pillar breakdown showing exactly which structural element is causing the low score. The five pillars are Content Structure, Fact Density, Answer Structure, Self-Containment, and Freshness Signals, each mapping to a specific, observable behaviour in how AI platforms select content for citation.

Tool Review & Gap Analysis

You've Run the HubSpot AEO Grader. Here's What Your Score Doesn't Tell You.

Published: 7 April 2026 Author: Cited By AI® Reading time: 7 min

Version 1.0 | Published 7 April 2026 | Last verified: 7 April 2026 | Source: citedbyai.info AI Visibility Intelligence

The HubSpot AEO Grader is a legitimate tool. It gives you five real scores across three real AI platforms, costs nothing, and takes two minutes. The problem isn't what it measures. It's what it can't see, and that gap is exactly where most brands' citation problems actually live.

We ran Cited By AI® through the grader ourselves. ChatGPT gave us a 36 overall, with a Share of Voice score of 0/10. Perplexity scored us 49. Gemini, 50. Those numbers are honest; we're a young brand in a fast-moving category. But here's what the grader couldn't tell us: which paragraph on our site is causing the ChatGPT score to be 14 points lower than Perplexity, and what specifically to change about it.

That's not a criticism of HubSpot. It's a description of what the tool is built to do. Understanding the distinction is what tells you what to do next.

What the HubSpot AEO Grader actually measures

The grader evaluates your brand across five scored dimensions, cross-validated across ChatGPT (OpenAI), Perplexity, and Gemini. It doesn't query Claude or Microsoft Copilot. The five dimensions are:

Sentiment Analysis

Tone of AI descriptions across three layers: overall sentiment, contextual sentiment by topic, and source-based sentiment. Scored out of 40, the highest-weighted dimension.
Brand Recognition

How frequently and prominently your brand appears in AI-generated responses. Scored out of 20.
Presence Quality

Data richness: the variety, depth, and consistency of information available about your brand across AI training data. Scored out of 20.
Share of Voice

Your estimated percentage of category-level AI mentions relative to competitors. Scored out of 10.
Market Position

Competitive classification: Leader, Challenger, or Niche Player, plus an innovation archetype (Innovator, Disruptor, Traditionalist). Scored out of 10.

These are all real signals. A low Sentiment score means AI is describing your brand in neutral or negative terms. A low Presence Quality score means AI models have thin or inconsistent data about you. A Share of Voice of 0/10 (which is what we got from ChatGPT) means you're not being mentioned in category conversations at all on that platform.

All of that is worth knowing. The question is what you do with it.

Our actual scores, and what the grader said

Running Cited By AI® through the grader on 7 April 2026 produced this:

Dimension	ChatGPT	Perplexity	Gemini
Overall Score	36/100	49/100	50/100
Brand Recognition	2/20	3/20	3/20
Market Score	6/10	6/10	6/10
Presence Quality	5/20	9/20	9/20
Brand Sentiment	23/40	29/40	30/40
Share of Voice	0/10	2/10	2/10

The grader's verdict across all three platforms: "You're on the right track, but there's room to further optimise for AEO." That's accurate. It's also where the diagnosis stops.

Our ChatGPT score is 13 points lower than our Perplexity score. Why? The grader can't tell us. Our Presence Quality on ChatGPT is 5/20 versus 9/20 on the other two. What specifically is causing that gap? Not in scope. Our Share of Voice on ChatGPT is zero. Which pages need rewriting to change that, and how? The report doesn't go there.

The specific gap: brand level vs block level

The HubSpot AEO Grader operates at the brand level. It asks AI platforms how they characterise your brand overall. That's a legitimate question with a real answer. But AI citation decisions aren't made at the brand level. They're made at the block level.

When someone asks ChatGPT "what's the best tool for AI search visibility in the UK?", the model doesn't evaluate your brand's overall reputation and decide whether to cite you. It runs a retrieval process, pulling the specific 134 to 167 word content chunks from its training data that best answer the query, then cites the sources those chunks came from. That retrieval decision happens at the paragraph level, not the brand level.

A brand-level score tells you AI isn't citing you much. A block-level score tells you which paragraph AI skipped last Tuesday and exactly why.

A paragraph that opens with brand narrative ("Cited By AI® is a UK-based consultancy that...") will be skipped by retrieval systems in favour of a paragraph that opens with a direct declarative answer ("AI Search Engine Optimisation is the practice of..."). Both paragraphs might be on the same domain with the same brand recognition score. The grader treats them identically. The retrieval system doesn't.

What the grader can't show you

HubSpot AEO Grader gives you

Five brand-level scores per platform
Composite score out of 100
Sentiment classification
Competitive position label
Written interpretation (behind email gate)
ChatGPT, Perplexity, Gemini coverage

What it doesn't show you

Which paragraph AI retrieval skips
Which citability pillar is failing
Claude coverage (Anthropic)
Microsoft Copilot coverage
Specific rewrite instructions
Funnel-stage Share of Voice breakdown

The Claude gap is worth flagging separately. Claude is Anthropic's AI assistant. It runs on a different training data pipeline to ChatGPT and Perplexity, cites different sources, and weights content differently. HubSpot's grader doesn't cover it. If Claude is where your buyers research your category, and for B2B professional services it increasingly is. That's a coverage gap in your visibility picture.

The five pillars the grader doesn't measure

The Citation Probability Score® (CPS®) framework measures each content block across five pillars that directly explain why retrieval systems cite or skip a given paragraph. None of these are measured at the brand level; each one applies to a specific 134 to 167 word chunk of text:

Content Structure Is the block in the optimal RAG chunk range? Does it open with a declarative answer or a brand narrative setup?
Fact Density How many named entities, statistics, and verifiable claims appear per 100 words? AI retrieval models weight fact-rich passages significantly higher than descriptive text.
Answer Structure Does the block open with the declarative pattern retrieval systems favour: "[Topic] is/provides/enables [specific outcome]"?
Self-Containment Does the block make complete sense in isolation, without the paragraph above or the section heading? Retrieval systems don't have that context.
Freshness Signals Does the block contain date markers and recency language? Critical for Perplexity and Bing-powered AI search, which weight recency heavily.

When our Presence Quality on ChatGPT scores 5/20 against 9/20 on Perplexity and Gemini, one explanation is that our content blocks have stronger freshness signals (which Perplexity and Gemini weight more heavily) but weaker self-containment, which affects ChatGPT retrieval more. The grader sees the symptom. CPS® finds the cause.

The right sequence

The HubSpot AEO Grader is the right starting point for understanding your brand's AI presence. Run it. Take the scores seriously. A 0/10 Share of Voice on ChatGPT is a real signal that demands attention.

But it's a starting point, not a fix. The grader tells you the room is cold. Finding which radiator valve is stuck, and on which paragraph, is what CPS® block scoring does.

The sequence that works: HubSpot grader → understand your brand-level position → CPS® Block Scorer → identify the specific paragraphs causing it → rewrite those blocks → rerun the grader to measure the delta.

That sequence converts a benchmark into a result. The grader alone doesn't close that loop. Neither does a full CPS® audit alone if you haven't established the baseline. Both tools are doing different jobs at different levels of the same problem.

Start with one paragraph

You don't need to audit your entire site today. Pick the page on your site that matters most for the query you want to own: the page your buyers would land on when they ask ChatGPT about your category. Paste the first paragraph into the CPS® Block Scorer. Get the score. Read the pillar breakdown. That single data point will tell you more about why your HubSpot Share of Voice is low than anything in the grader's report.

📊

CPS® Block Scorer — free, instant, no login

Paste any paragraph. Get a 0–100 Citation Probability Score® with a five-pillar breakdown showing exactly which element is causing the problem, and what to fix.

Score Your First Paragraph → Free tool · No signup · Results in seconds

Want the full picture — including Claude and Copilot?

Free audit. 28 modules. All 5 platforms. Block-level CPS® scoring. Results in 48 hours.

Get Your Free Audit →

You've Run the HubSpot AEO Grader. Here's What Your Score Doesn't Tell You.

What the HubSpot AEO Grader actually measures

Our actual scores, and what the grader said

The specific gap: brand level vs block level

What the grader can't show you

The five pillars the grader doesn't measure

The right sequence

Start with one paragraph

CPS® Block Scorer — free, instant, no login

Related

Want the full picture — including Claude and Copilot?