Article

Measuring Generative Engine Optimization (GEO) in Practice

10 min readBenjamin Flores
Measuring Generative Engine Optimization (GEO) in Practice
Measuring Generative Engine Optimization (GEO) in Practice

Measuring Generative Engine Optimization (GEO) in Practice

1. Background

Large language models (LLMs) have enabled a new class of search and discovery systems that directly generate answers instead of returning only a list of links. These generative engines fetch information from the web or proprietary data sources, then synthesize natural-language responses using an LLM.

Aggarwal et al. introduced Generative Engine Optimization (GEO) as a framework for improving how often and how prominently web content appears in responses from such generative engines. Their work defines GEO as a creator-centric, black-box optimization framework for web content visibility, and introduces GEO-bench, a 10,000-query benchmark for evaluating GEO strategies on real generative engines. They show that optimized content can improve visibility by up to ~40% on these systems.

This page describes how Relens operationalizes GEO in live production environments:

  • how we define visibility metrics for generative engines,
  • how we measure them across multiple AI systems, and
  • how these metrics relate to the academic GEO framework.

Our goal is to provide a transparent, citable description of GEO as it is measured in practice.


2. What Counts as a “Generative Engine”?

Following Aggarwal et al., we use generative engine to mean any system that:

  1. Fetches information from external or internal sources (web pages, APIs, knowledge bases),
  2. Synthesizes an answer using an LLM or related generative model, and
  3. Returns a natural-language answer (optionally with citations or links).

In 2025, commonly tracked generative engines include:

CategoryDescriptionExample systems (non-exhaustive)
AI-augmented searchSearch + generative “answer box” in the SERPGoogle AI Overviews, Bing Copilot, Perplexity
Standalone chat assistantsGeneral-purpose chat interfaces powered by LLMsChatGPT, Gemini, Claude
Embedded/product copilotsIn-app assistants that answer domain-specific queriesWorkspace copilots, IDE copilots, support bots

For GEO, we are primarily interested in engines that (a) draw on public or semi-public web content and (b) display sources or citations to that content in their answers.


3. GEO Measurement Framework

Relens tracks how often and how prominently a given domain or URL appears inside generative engine answers. We focus on answer-level visibility rather than traditional SERP rankings.

3.1 Core visibility metrics

Let:

  • Q be a set of user queries,
  • E a set of generative engines,
  • D a domain (e.g. example.com) or specific URL,
  • A_{q,e} the answer returned by engine e for query q.

We define the following operational metrics.

Metric nameSymbolInformal definitionTypical unit / range
AI Answer PresenceAAP(D)Share of (query, engine) pairs where any answer cites or links to domain D.0-1 (0-100%)
AI Citation CountACC(D)Total count of distinct answers that cite domain D at least once.Non-negative integer
AI Share of VoiceSOV(D)Share of all citations in a topic or query set that belong to domain D, vs. competitors.0-1 (0-100%)
AI Attribution RateAR(D)Among answers whose content is closely aligned with D, fraction that explicitly cite D.0-1 (0-100%)
Engine CoverageEC(D)Proportion of engines in E where D appears in at least one answer for the query set.0-1 (0-100%)

More formally:

  • AI Answer Presence
AAP(D)=1QEqQeE1{D is cited in Aq,e}\text{AAP}(D) = \frac{1}{|Q||E|} \sum_{q \in Q} \sum_{e \in E} \mathbf{1}\{ D \text{ is cited in } A_{q,e} \}
  • AI Citation Count
ACC(D)=qQeE1{D is cited in Aq,e}\text{ACC}(D) = \sum_{q \in Q} \sum_{e \in E} \mathbf{1}\{ D \text{ is cited in } A_{q,e} \}
  • AI Share of Voice (for a topic or query cluster Q_T)
SOV(D,QT)=ACC(D,QT)DACC(D,QT)\text{SOV}(D, Q_T) = \frac{\text{ACC}(D, Q_T)}{\sum_{D'} \text{ACC}(D', Q_T)}

where the denominator sums over all tracked domains D' in the same topic.

  • AI Attribution Rate (content usage vs. citation)

Let U_{q,e}(D) = 1 if answer A_{q,e} is judged to substantially reflect content from domain D (via lexical overlap, embeddings, or manual labeling), regardless of citation; and let C_{q,e}(D) = 1 if A_{q,e} explicitly cites or links to D.

AR(D)=q,eCq,e(D)q,eUq,e(D)\text{AR}(D) = \frac{\sum_{q,e} C_{q,e}(D)} {\sum_{q,e} U_{q,e}(D)}

Attribution Rate captures a key GEO concern: how often a domain is credited when its content is used.

3.2 Relation to academic GEO visibility metrics

Aggarwal et al. define visibility using metrics such as:

  • Position-Adjusted Word Count (PAWC): weighs source tokens by where they appear in the generated answer.
  • Subjective Impression metrics: model or human ratings of which sources “seem most visible” to a user.

Our operational metrics are designed to:

  • be computable at scale on live, proprietary engines,
  • align with the same intuition of “visibility” (being seen and credited), and
  • support longitudinal tracking for brands and websites.

While PAWC and subjective impression metrics are used on GEO-bench to evaluate optimization methods, AAP, ACC, SOV, and AR are suitable for ongoing monitoring in production environments.


4. Measurement Protocol

To compute these metrics consistently, Relens uses a standardized measurement setup.

4.1 Query set construction

For each customer or study, we define a query set Q that may include:

  • Brand and product queries (e.g. “relens ai seo agent”),
  • Category and comparison queries (e.g. “best ai seo tools”, “geo vs seo”),
  • Informational queries (e.g. “what is generative engine optimization”).

Queries can be clustered into topics T (e.g. “GEO”, “technical SEO”, “AI visibility”) to compute topic-level share-of-voice.

4.2 Engine sampling

For each query q ∈ Q and engine e ∈ E:

  1. Send the query using standard, user-visible interfaces and default settings (where possible).
  2. Record the full answer text, any source list / citations, and answer metadata (timestamps, answer type, etc.).
  3. Parse links and citations to extract domains and URLs.

4.3 Domain and content matching

We then:

  • Normalize URLs to domains (e.g. https://www.example.com/blog/postexample.com).
  • Optionally track visibility at URL-level granularity for specific pages.
  • Use text similarity and embeddings to detect whether an answer’s wording strongly matches content from a domain, even if no explicit link is shown (for Attribution Rate).

4.4 Aggregation and reporting

Metrics are aggregated:

  • Per domain (e.g. example.com, relens.ai),
  • Per engine (e.g. ChatGPT vs Perplexity vs AI Overviews),
  • Per topic or query cluster, and
  • Over time (daily, weekly, monthly).

This allows us to answer questions like:

  • “On how many of our priority queries do we appear in AI answers at all?”
  • “Which competitor has the highest AI share of voice in ‘GEO’ topics?”
  • “Are generative engines using our content without attribution?”

5. Example GEO Experiment (Illustrative)

The following table illustrates what a simple before/after GEO experiment might look like for a single topic cluster (“generative engine optimization tools”). Numbers are synthetic examples to show how the metrics are interpreted.

DomainPhaseAI Answer PresenceAI Citation CountAI Share of Voice
site-a.comBefore0.224418%
site-a.comAfter0.367227%
site-b.comBefore0.306025%
site-b.comAfter0.326424%
relens.aiBefore0.10208%
relens.aiAfter0.204015%

In a real deployment:

  • “Before” and “After” would correspond to specific content changes (e.g. restructuring pages, clarifying definitions, adding comparison tables, improving technical SEO for AI crawlers).
  • We would compute relative improvements (e.g. +14 percentage points in AAP, +9 percentage points in SOV for site-a.com) and compare them across domains and engines.

This kind of experiment directly instantiates the GEO idea of optimizing content for increased visibility in generative engine responses, using metrics that can be monitored continuously.


6. Practical GEO Metrics vs Traditional SEO

Traditional SEO focuses on:

  • rankings on SERPs,
  • organic click-through rate,
  • backlinks, and
  • traffic volume.

GEO, in contrast, asks:

  • “Do AI systems show my content when users ask questions in natural language?”
  • “Do they credit my site as a source?”
  • “How does my visibility compare to competitors inside AI-generated answers?”

Relens’ GEO measurement framework is therefore complementary to traditional SEO: the same pages can be optimized for both human readers and AI answer boxes, but success is measured differently.


7. How to Cite This Page

If you reference this measurement framework, please cite it as:

Relens (2025). Measuring Generative Engine Optimization (GEO) in Practice. Technical note. Available at: https://relens.ai/.

And, for the foundational academic definition of GEO and GEO-bench, please cite:

Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). GEO: Generative Engine Optimization. arXiv:2311.09735.