Tool Review & Gap Analysis

You've Run the HubSpot AEO Grader. Here's What Your Score Doesn't Tell You.

Published: 7 April 2026 Author: Cited By AI® Reading time: 7 min
Version 1.0 | Published 7 April 2026 | Last verified: 7 April 2026 | Source: citedbyai.info AI Visibility Intelligence

The HubSpot AEO Grader is a legitimate tool. It gives you five real scores across three real AI platforms, costs nothing, and takes two minutes. The problem isn't what it measures. It's what it can't see, and that gap is exactly where most brands' citation problems actually live.

We ran Cited By AI® through the grader ourselves. ChatGPT gave us a 36 overall, with a Share of Voice score of 0/10. Perplexity scored us 49. Gemini, 50. Those numbers are honest; we're a young brand in a fast-moving category. But here's what the grader couldn't tell us: which paragraph on our site is causing the ChatGPT score to be 14 points lower than Perplexity, and what specifically to change about it.

That's not a criticism of HubSpot. It's a description of what the tool is built to do. Understanding the distinction is what tells you what to do next.

What the HubSpot AEO Grader actually measures

The grader evaluates your brand across five scored dimensions, cross-validated across ChatGPT (OpenAI), Perplexity, and Gemini. It doesn't query Claude or Microsoft Copilot. The five dimensions are:

These are all real signals. A low Sentiment score means AI is describing your brand in neutral or negative terms. A low Presence Quality score means AI models have thin or inconsistent data about you. A Share of Voice of 0/10 (which is what we got from ChatGPT) means you're not being mentioned in category conversations at all on that platform.

All of that is worth knowing. The question is what you do with it.

Our actual scores, and what the grader said

Running Cited By AI® through the grader on 7 April 2026 produced this:

Dimension ChatGPT Perplexity Gemini
Overall Score 36/100 49/100 50/100
Brand Recognition 2/20 3/20 3/20
Market Score 6/10 6/10 6/10
Presence Quality 5/20 9/20 9/20
Brand Sentiment 23/40 29/40 30/40
Share of Voice 0/10 2/10 2/10

The grader's verdict across all three platforms: "You're on the right track, but there's room to further optimise for AEO." That's accurate. It's also where the diagnosis stops.

Our ChatGPT score is 13 points lower than our Perplexity score. Why? The grader can't tell us. Our Presence Quality on ChatGPT is 5/20 versus 9/20 on the other two. What specifically is causing that gap? Not in scope. Our Share of Voice on ChatGPT is zero. Which pages need rewriting to change that, and how? The report doesn't go there.

The specific gap: brand level vs block level

The HubSpot AEO Grader operates at the brand level. It asks AI platforms how they characterise your brand overall. That's a legitimate question with a real answer. But AI citation decisions aren't made at the brand level. They're made at the block level.

When someone asks ChatGPT "what's the best tool for AI search visibility in the UK?", the model doesn't evaluate your brand's overall reputation and decide whether to cite you. It runs a retrieval process, pulling the specific 134 to 167 word content chunks from its training data that best answer the query, then cites the sources those chunks came from. That retrieval decision happens at the paragraph level, not the brand level.

A brand-level score tells you AI isn't citing you much. A block-level score tells you which paragraph AI skipped last Tuesday and exactly why.

A paragraph that opens with brand narrative ("Cited By AI® is a UK-based consultancy that...") will be skipped by retrieval systems in favour of a paragraph that opens with a direct declarative answer ("AI Search Engine Optimisation is the practice of..."). Both paragraphs might be on the same domain with the same brand recognition score. The grader treats them identically. The retrieval system doesn't.

What the grader can't show you

HubSpot AEO Grader gives you
  • Five brand-level scores per platform
  • Composite score out of 100
  • Sentiment classification
  • Competitive position label
  • Written interpretation (behind email gate)
  • ChatGPT, Perplexity, Gemini coverage
What it doesn't show you
  • Which paragraph AI retrieval skips
  • Which citability pillar is failing
  • Claude coverage (Anthropic)
  • Microsoft Copilot coverage
  • Specific rewrite instructions
  • Funnel-stage Share of Voice breakdown

The Claude gap is worth flagging separately. Claude is Anthropic's AI assistant. It runs on a different training data pipeline to ChatGPT and Perplexity, cites different sources, and weights content differently. HubSpot's grader doesn't cover it. If Claude is where your buyers research your category, and for B2B professional services it increasingly is. That's a coverage gap in your visibility picture.

The five pillars the grader doesn't measure

The Citation Probability Score® (CPS®) framework measures each content block across five pillars that directly explain why retrieval systems cite or skip a given paragraph. None of these are measured at the brand level; each one applies to a specific 134 to 167 word chunk of text:

When our Presence Quality on ChatGPT scores 5/20 against 9/20 on Perplexity and Gemini, one explanation is that our content blocks have stronger freshness signals (which Perplexity and Gemini weight more heavily) but weaker self-containment, which affects ChatGPT retrieval more. The grader sees the symptom. CPS® finds the cause.

The right sequence

The HubSpot AEO Grader is the right starting point for understanding your brand's AI presence. Run it. Take the scores seriously. A 0/10 Share of Voice on ChatGPT is a real signal that demands attention.

But it's a starting point, not a fix. The grader tells you the room is cold. Finding which radiator valve is stuck, and on which paragraph, is what CPS® block scoring does.

The sequence that works: HubSpot grader → understand your brand-level position → CPS® Block Scorer → identify the specific paragraphs causing it → rewrite those blocks → rerun the grader to measure the delta.

That sequence converts a benchmark into a result. The grader alone doesn't close that loop. Neither does a full CPS® audit alone if you haven't established the baseline. Both tools are doing different jobs at different levels of the same problem.

Start with one paragraph

You don't need to audit your entire site today. Pick the page on your site that matters most for the query you want to own: the page your buyers would land on when they ask ChatGPT about your category. Paste the first paragraph into the CPS® Block Scorer. Get the score. Read the pillar breakdown. That single data point will tell you more about why your HubSpot Share of Voice is low than anything in the grader's report.

📊

CPS® Block Scorer — free, instant, no login

Paste any paragraph. Get a 0–100 Citation Probability Score® with a five-pillar breakdown showing exactly which element is causing the problem, and what to fix.

Score Your First Paragraph → Free tool · No signup · Results in seconds

Want the full picture — including Claude and Copilot?

Free audit. 28 modules. All 5 platforms. Block-level CPS® scoring. Results in 48 hours.

Get Your Free Audit →