Back to methodologyResearch note 2.5

The Science — Papers & Research

The foundational GEO paper (KDD 2024), large-scale citation studies, LLM bias research, search volume decline data, and myth-busting with empirical evidence.

+41%
Quotation addition (GEO paper)
-9%
Keyword stuffing (harmful)
680M
Citations analyzed (Profound)
25%
Search decline by 2026 (Gartner)

1. The Foundational Paper: GEO — Generative Engine Optimization

Authors: Aggarwal, Murahari, Rajpurohit, Kalyan, Narasimhan, Deshpande

Institutions: Princeton, Georgia Tech, The Allen Institute of AI, IIT Delhi

Venue: KDD 2024 (30th ACM SIGKDD Conference)

arXiv: 2311.09735

GEO-Bench: The Benchmark

ParameterDetail
Total queries10,000 (8K train / 1K validation / 1K test)
Query distribution80% informational, 10% transactional, 10% navigational
Domains25 categories
Sources per queryTop 5 Google search results
Data sources9 datasets (MS MARCO, ORCAS-1, Natural Questions, AllSouls, LIMA, etc.)

Nine Optimization Strategies Tested

StrategyVisibility ChangeNotes
Quotation Addition+41%Most effective overall
Statistics Addition+32%Quantitative > qualitative
Fluency Optimization+28%Readability improvements
Cite Sources+27%+115% for rank-5 sites
Technical Terms+18%Domain-specific terminology
Easy-to-Understand+14%Simplified language
Authoritative+12%Persuasive tone
Unique Words+6%Minimal impact
Keyword Stuffing-9%Harmful — traditional SEO tactic backfires
The single most important result: Traditional SEO keyword stuffing decreases visibility by 9% in generative engines. GEO and SEO are not the same discipline.

The Democratization Effect

Lower-ranked websites benefit dramatically more from GEO than top-ranked ones:

Google RankStrategyVisibility Change
Rank 5Cite Sources+115.1%
Rank 5Statistics Addition+97.9%
Rank 1Cite Sources-30.3%
Rank 1Statistics Addition-20.6%
GEO “levels the playing field” for smaller content creators. A page ranked #5 in Google can more than double its generative engine visibility by adding citations.

Domain-Specific Effectiveness

DomainMost Effective Strategy
Debate / HistoryAuthoritative, Quotation Addition
ScienceAuthoritative, Fluency Optimization
Business / HealthFluency Optimization
Law & GovernmentCite Sources, Statistics Addition
Facts / StatementsCite Sources
People & SocietyQuotation Addition
OpinionStatistics Addition

Best Combination

Fluency Optimization + Statistics Addition showed maximum combined performance: 5.5% improvement over best single-method. Cite Sources averaged 31.4% improvement when combined with other methods. Not all combinations are additive.

2. Large-Scale Citation Studies

Profound: 680 Million Citations

Platform#1 SourceShare#2 Source
ChatGPTWikipedia7.8%Reddit (1.8%)
PerplexityReddit6.6%YouTube (2.0%)
Google AI OverviewsReddit2.2%YouTube (1.9%)

Cross-platform overlap is remarkably low: only 11% of domains are cited by both ChatGPT and Perplexity. Commercial domains (.com) account for 80.41% of all citations; nonprofit (.org) accounts for 11.29%.

Yext: 17.2 Million Citations

Analyzed citations across ChatGPT, Perplexity, Google AI Overviews, and Claude. Key finding: “There is no single AI optimization strategy” that works across all models. Each platform has fundamentally different citation patterns. Recommended: optimize per-platform rather than using a universal approach.

Ahrefs: 17 Million Citations

AI-cited content is 25.7% fresher than traditional search results. The top 30 domains capture 67% of all citations. Ahrefs found that content cited by AI tends to be more recently published and more frequently updated.

Brandlight: Traditional vs. AI Search

90% of ChatGPT citations come from pages outside Google’s top 20. The Google SERP ↔ AI citation overlap dropped from 70% to below 20%. This confirms that optimizing for Google is no longer sufficient for AI visibility.

3. Search Volume Decline Data

SourcePrediction / Finding
Gartner (Feb 2024)Traditional search volume drops 25% by 2026
Gartner (extended)Organic traffic drops 50% by 2028
Bain & Company60% of searches now end without a click (zero-click)
AhrefsAI Overviews reduce CTR for top pages by 58%
AppleSafari had first-ever search decline (May 2025)
Google searches/userDropped 20% YoY in U.S. (2025)
AI chatbot traffic80.92% YoY growth
Consumer AI usage58% use AI for product recommendations (up from 25% in 2023)
Counterpoint: Some analysts contest Gartner’s projection. Datos analysis found “almost no indication that traditional search is on a path to a 25% decline” through mid-2025 traffic data. LLM usage remains under 5% of global search queries.

4. The a16z Perspective

Visibility means showing up directly in the answer itself, rather than ranking high.

Zach Cohen & Seema Amble, a16z (May 2025)

Key data points from a16z’s “How Generative Engine Optimization Rewrites the Rules of Search”:

MetricValue
SEO market foundation$80 billion+
AI search query length23 words avg (vs. 4 in traditional)
AI session depth6 minutes average
ChatGPT → Vercel signups10% of new signups
New success metric"Reference rates" — how often cited in AI answers

How you're encoded into the AI layer is the new competitive advantage.

a16z

5. LLM Bias and Stochasticity Research

LLM Whisperer (Carnegie Mellon, 2024)

449 prompts across 77 product categories, 1,000 responses per prompt. Found that subtle synonym replacements can increase brand mention likelihood by up to 78%. Semantically equivalent prompts produced absolute mention differences of 7.4% to 18.6%. Maximum variance: 100% (InstantPot from 0% to 100% between equivalent prompts).

Position Bias in LLM Recommendations

First-mentioned brands receive “direct-answer language” while later positions get “other options include” framing. Only 3–4 brands are cited per ChatGPT response vs. 13 for Perplexity, creating winner-take-all dynamics. Less than 1-in-100 chance of any platform producing the same recommendation list twice.

6. Myth-Busting

MythRealityEvidence
llms.txt files help AI find youZero evidence of any effectNo peer-reviewed study; no LLM provider has confirmed they use it
Schema markup increases citationsImproves accuracy, NOT frequencySearch Atlas: 748K queries show no frequency effect
Backlinks drive AI visibilityWeak/neutral correlationDigital Bloom, Seer Interactive
Keyword optimization works for AIKeyword stuffing is HARMFUL (-9%)GEO paper (KDD 2024)
Fresh content helpsTRUE — strongest myth with evidence65% of AI hits on <1yr content; 3x boost for 14-day freshness

7. Follow-Up Papers

PaperKey Finding
CORE (Jin et al., 2026)91.4% promotion success rate @Top-5 by targeting synthesis stage (not retrieval)
E-GEO (2025)GEO signals diverge substantially from SEO signals in e-commerce contexts
Diagnosing Citation FailuresAsks WHY a document fails to be cited, rather than generic rewriting
Beyond Keywords (Content-Centric Agents)End-to-end GSEO framework for post-keyword era
Kumar & Lakkaraju (2024)Strategic text sequences can manipulate LLM recommendations (adversarial angle)

8. Implications for Bitsy

Build on proven strategies: Quotation addition (+41%), statistics (+32%), source citations (+27%), and fluency optimization (+28%) are the evidence-based GEO strategies. Content should be chunked into 40–60 word self-contained blocks. Freshness is the strongest signal (<14 days = 3x boost).
Avoid snake oil: Do not build features around keyword stuffing (proven harmful), llms.txt (zero evidence), or schema-for-frequency (disproven at scale). Do not promise universal strategies — effectiveness varies by domain and model. Include honest uncertainty ranges.
Key architectural insight: The 11% cross-platform domain overlap means per-model monitoring and optimization is essential. A single “GEO score” would be misleading. Track parametric vs. RAG separately — parametric is durable (18–36 months), RAG is volatile.