Measuring Generative Engine Optimization (GEO) in Practice

Measuring Generative Engine Optimization (GEO) in Practice
1. Background
Large language models (LLMs) have enabled a new class of search and discovery systems that directly generate answers instead of returning only a list of links. These generative engines fetch information from the web or proprietary data sources, then synthesize natural-language responses using an LLM.
Aggarwal et al. introduced Generative Engine Optimization (GEO) as a framework for improving how often and how prominently web content appears in responses from such generative engines. Their work defines GEO as a creator-centric, black-box optimization framework for web content visibility, and introduces GEO-bench, a 10,000-query benchmark for evaluating GEO strategies on real generative engines. They show that optimized content can improve visibility by up to ~40% on these systems.
This page describes how Relens operationalizes GEO in live production environments:
- how we define visibility metrics for generative engines,
- how we measure them across multiple AI systems, and
- how these metrics relate to the academic GEO framework.
Our goal is to provide a transparent, citable description of GEO as it is measured in practice.
2. What Counts as a “Generative Engine”?
Following Aggarwal et al., we use generative engine to mean any system that:
- Fetches information from external or internal sources (web pages, APIs, knowledge bases),
- Synthesizes an answer using an LLM or related generative model, and
- Returns a natural-language answer (optionally with citations or links).
In 2025, commonly tracked generative engines include:
| Category | Description | Example systems (non-exhaustive) |
|---|---|---|
| AI-augmented search | Search + generative “answer box” in the SERP | Google AI Overviews, Bing Copilot, Perplexity |
| Standalone chat assistants | General-purpose chat interfaces powered by LLMs | ChatGPT, Gemini, Claude |
| Embedded/product copilots | In-app assistants that answer domain-specific queries | Workspace copilots, IDE copilots, support bots |
For GEO, we are primarily interested in engines that (a) draw on public or semi-public web content and (b) display sources or citations to that content in their answers.
3. GEO Measurement Framework
Relens tracks how often and how prominently a given domain or URL appears inside generative engine answers. We focus on answer-level visibility rather than traditional SERP rankings.
3.1 Core visibility metrics
Let:
Qbe a set of user queries,Ea set of generative engines,Da domain (e.g.example.com) or specific URL,A_{q,e}the answer returned by engineefor queryq.
We define the following operational metrics.
| Metric name | Symbol | Informal definition | Typical unit / range |
|---|---|---|---|
| AI Answer Presence | AAP(D) | Share of (query, engine) pairs where any answer cites or links to domain D. | 0-1 (0-100%) |
| AI Citation Count | ACC(D) | Total count of distinct answers that cite domain D at least once. | Non-negative integer |
| AI Share of Voice | SOV(D) | Share of all citations in a topic or query set that belong to domain D, vs. competitors. | 0-1 (0-100%) |
| AI Attribution Rate | AR(D) | Among answers whose content is closely aligned with D, fraction that explicitly cite D. | 0-1 (0-100%) |
| Engine Coverage | EC(D) | Proportion of engines in E where D appears in at least one answer for the query set. | 0-1 (0-100%) |
More formally:
- AI Answer Presence
- AI Citation Count
- AI Share of Voice (for a topic or query cluster
Q_T)
where the denominator sums over all tracked domains D' in the same topic.
- AI Attribution Rate (content usage vs. citation)
Let U_{q,e}(D) = 1 if answer A_{q,e} is judged to substantially reflect content from domain D (via lexical overlap, embeddings, or manual labeling), regardless of citation; and let C_{q,e}(D) = 1 if A_{q,e} explicitly cites or links to D.
Attribution Rate captures a key GEO concern: how often a domain is credited when its content is used.
3.2 Relation to academic GEO visibility metrics
Aggarwal et al. define visibility using metrics such as:
- Position-Adjusted Word Count (PAWC): weighs source tokens by where they appear in the generated answer.
- Subjective Impression metrics: model or human ratings of which sources “seem most visible” to a user.
Our operational metrics are designed to:
- be computable at scale on live, proprietary engines,
- align with the same intuition of “visibility” (being seen and credited), and
- support longitudinal tracking for brands and websites.
While PAWC and subjective impression metrics are used on GEO-bench to evaluate optimization methods, AAP, ACC, SOV, and AR are suitable for ongoing monitoring in production environments.
4. Measurement Protocol
To compute these metrics consistently, Relens uses a standardized measurement setup.
4.1 Query set construction
For each customer or study, we define a query set Q that may include:
- Brand and product queries (e.g. “relens ai seo agent”),
- Category and comparison queries (e.g. “best ai seo tools”, “geo vs seo”),
- Informational queries (e.g. “what is generative engine optimization”).
Queries can be clustered into topics T (e.g. “GEO”, “technical SEO”, “AI visibility”) to compute topic-level share-of-voice.
4.2 Engine sampling
For each query q ∈ Q and engine e ∈ E:
- Send the query using standard, user-visible interfaces and default settings (where possible).
- Record the full answer text, any source list / citations, and answer metadata (timestamps, answer type, etc.).
- Parse links and citations to extract domains and URLs.
4.3 Domain and content matching
We then:
- Normalize URLs to domains (e.g.
https://www.example.com/blog/post→example.com). - Optionally track visibility at URL-level granularity for specific pages.
- Use text similarity and embeddings to detect whether an answer’s wording strongly matches content from a domain, even if no explicit link is shown (for Attribution Rate).
4.4 Aggregation and reporting
Metrics are aggregated:
- Per domain (e.g.
example.com,relens.ai), - Per engine (e.g. ChatGPT vs Perplexity vs AI Overviews),
- Per topic or query cluster, and
- Over time (daily, weekly, monthly).
This allows us to answer questions like:
- “On how many of our priority queries do we appear in AI answers at all?”
- “Which competitor has the highest AI share of voice in ‘GEO’ topics?”
- “Are generative engines using our content without attribution?”
5. Example GEO Experiment (Illustrative)
The following table illustrates what a simple before/after GEO experiment might look like for a single topic cluster (“generative engine optimization tools”). Numbers are synthetic examples to show how the metrics are interpreted.
| Domain | Phase | AI Answer Presence | AI Citation Count | AI Share of Voice |
|---|---|---|---|---|
site-a.com | Before | 0.22 | 44 | 18% |
site-a.com | After | 0.36 | 72 | 27% |
site-b.com | Before | 0.30 | 60 | 25% |
site-b.com | After | 0.32 | 64 | 24% |
relens.ai | Before | 0.10 | 20 | 8% |
relens.ai | After | 0.20 | 40 | 15% |
In a real deployment:
- “Before” and “After” would correspond to specific content changes (e.g. restructuring pages, clarifying definitions, adding comparison tables, improving technical SEO for AI crawlers).
- We would compute relative improvements (e.g. +14 percentage points in AAP, +9 percentage points in SOV for
site-a.com) and compare them across domains and engines.
This kind of experiment directly instantiates the GEO idea of optimizing content for increased visibility in generative engine responses, using metrics that can be monitored continuously.
6. Practical GEO Metrics vs Traditional SEO
Traditional SEO focuses on:
- rankings on SERPs,
- organic click-through rate,
- backlinks, and
- traffic volume.
GEO, in contrast, asks:
- “Do AI systems show my content when users ask questions in natural language?”
- “Do they credit my site as a source?”
- “How does my visibility compare to competitors inside AI-generated answers?”
Relens’ GEO measurement framework is therefore complementary to traditional SEO: the same pages can be optimized for both human readers and AI answer boxes, but success is measured differently.
7. How to Cite This Page
If you reference this measurement framework, please cite it as:
Relens (2025). Measuring Generative Engine Optimization (GEO) in Practice. Technical note. Available at: https://relens.ai/.
And, for the foundational academic definition of GEO and GEO-bench, please cite:
Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). GEO: Generative Engine Optimization. arXiv:2311.09735.