Back to methodologyResearch note 2.5

The Science — Papers & Research

The foundational GEO paper (KDD 2024), large-scale citation studies, LLM bias research, search volume decline data, and myth-busting with empirical evidence.

+41%

Quotation addition (GEO paper)

-9%

Keyword stuffing (harmful)

680M

Citations analyzed (Profound)

25%

Search decline by 2026 (Gartner)

1. The Foundational Paper: GEO — Generative Engine Optimization

Authors: Aggarwal, Murahari, Rajpurohit, Kalyan, Narasimhan, Deshpande

Institutions: Princeton, Georgia Tech, The Allen Institute of AI, IIT Delhi

Venue: KDD 2024 (30th ACM SIGKDD Conference)

arXiv: 2311.09735

GEO-Bench: The Benchmark

Parameter	Detail
Total queries	10,000 (8K train / 1K validation / 1K test)
Query distribution	80% informational, 10% transactional, 10% navigational
Domains	25 categories
Sources per query	Top 5 Google search results
Data sources	9 datasets (MS MARCO, ORCAS-1, Natural Questions, AllSouls, LIMA, etc.)

Nine Optimization Strategies Tested

Strategy	Visibility Change	Notes
Quotation Addition	+41%	Most effective overall
Statistics Addition	+32%	Quantitative > qualitative
Fluency Optimization	+28%	Readability improvements
Cite Sources	+27%	+115% for rank-5 sites
Technical Terms	+18%	Domain-specific terminology
Easy-to-Understand	+14%	Simplified language
Authoritative	+12%	Persuasive tone
Unique Words	+6%	Minimal impact
Keyword Stuffing	-9%	Harmful — traditional SEO tactic backfires

The single most important result: Traditional SEO keyword stuffing decreases visibility by 9% in generative engines. GEO and SEO are not the same discipline.

The Democratization Effect

Lower-ranked websites benefit dramatically more from GEO than top-ranked ones:

Google Rank	Strategy	Visibility Change
Rank 5	Cite Sources	+115.1%
Rank 5	Statistics Addition	+97.9%
Rank 1	Cite Sources	-30.3%
Rank 1	Statistics Addition	-20.6%

GEO “levels the playing field” for smaller content creators. A page ranked #5 in Google can more than double its generative engine visibility by adding citations.

Domain-Specific Effectiveness

Domain	Most Effective Strategy
Debate / History	Authoritative, Quotation Addition
Science	Authoritative, Fluency Optimization
Business / Health	Fluency Optimization
Law & Government	Cite Sources, Statistics Addition
Facts / Statements	Cite Sources
People & Society	Quotation Addition
Opinion	Statistics Addition

Best Combination

Fluency Optimization + Statistics Addition showed maximum combined performance: 5.5% improvement over best single-method. Cite Sources averaged 31.4% improvement when combined with other methods. Not all combinations are additive.

2. Large-Scale Citation Studies

Profound: 680 Million Citations

Platform	#1 Source	Share	#2 Source
ChatGPT	Wikipedia	7.8%	Reddit (1.8%)
Perplexity	Reddit	6.6%	YouTube (2.0%)
Google AI Overviews	Reddit	2.2%	YouTube (1.9%)

Cross-platform overlap is remarkably low: only 11% of domains are cited by both ChatGPT and Perplexity. Commercial domains (.com) account for 80.41% of all citations; nonprofit (.org) accounts for 11.29%.

Yext: 17.2 Million Citations

Analyzed citations across ChatGPT, Perplexity, Google AI Overviews, and Claude. Key finding: “There is no single AI optimization strategy” that works across all models. Each platform has fundamentally different citation patterns. Recommended: optimize per-platform rather than using a universal approach.

Ahrefs: 17 Million Citations

AI-cited content is 25.7% fresher than traditional search results. The top 30 domains capture 67% of all citations. Ahrefs found that content cited by AI tends to be more recently published and more frequently updated.

Brandlight: Traditional vs. AI Search

90% of ChatGPT citations come from pages outside Google’s top 20. The Google SERP ↔ AI citation overlap dropped from 70% to below 20%. This confirms that optimizing for Google is no longer sufficient for AI visibility.

3. Search Volume Decline Data

Source	Prediction / Finding
Gartner (Feb 2024)	Traditional search volume drops 25% by 2026
Gartner (extended)	Organic traffic drops 50% by 2028
Bain & Company	60% of searches now end without a click (zero-click)
Ahrefs	AI Overviews reduce CTR for top pages by 58%
Apple	Safari had first-ever search decline (May 2025)
Google searches/user	Dropped 20% YoY in U.S. (2025)
AI chatbot traffic	80.92% YoY growth
Consumer AI usage	58% use AI for product recommendations (up from 25% in 2023)

Counterpoint: Some analysts contest Gartner’s projection. Datos analysis found “almost no indication that traditional search is on a path to a 25% decline” through mid-2025 traffic data. LLM usage remains under 5% of global search queries.

4. The a16z Perspective

“Visibility means showing up directly in the answer itself, rather than ranking high.”
Zach Cohen & Seema Amble, a16z (May 2025)

Key data points from a16z’s “How Generative Engine Optimization Rewrites the Rules of Search”:

Metric	Value
SEO market foundation	$80 billion+
AI search query length	23 words avg (vs. 4 in traditional)
AI session depth	6 minutes average
ChatGPT → Vercel signups	10% of new signups
New success metric	"Reference rates" — how often cited in AI answers

“How you're encoded into the AI layer is the new competitive advantage.”
a16z

5. LLM Bias and Stochasticity Research

LLM Whisperer (Carnegie Mellon, 2024)

449 prompts across 77 product categories, 1,000 responses per prompt. Found that subtle synonym replacements can increase brand mention likelihood by up to 78%. Semantically equivalent prompts produced absolute mention differences of 7.4% to 18.6%. Maximum variance: 100% (InstantPot from 0% to 100% between equivalent prompts).

Position Bias in LLM Recommendations

First-mentioned brands receive “direct-answer language” while later positions get “other options include” framing. Only 3–4 brands are cited per ChatGPT response vs. 13 for Perplexity, creating winner-take-all dynamics. Less than 1-in-100 chance of any platform producing the same recommendation list twice.

6. Myth-Busting

Myth	Reality	Evidence
llms.txt files help AI find you	Zero evidence of any effect	No peer-reviewed study; no LLM provider has confirmed they use it
Schema markup increases citations	Improves accuracy, NOT frequency	Search Atlas: 748K queries show no frequency effect
Backlinks drive AI visibility	Weak/neutral correlation	Digital Bloom, Seer Interactive
Keyword optimization works for AI	Keyword stuffing is HARMFUL (-9%)	GEO paper (KDD 2024)
Fresh content helps	TRUE — strongest myth with evidence	65% of AI hits on <1yr content; 3x boost for 14-day freshness

7. Follow-Up Papers

Paper	Key Finding
CORE (Jin et al., 2026)	91.4% promotion success rate @Top-5 by targeting synthesis stage (not retrieval)
E-GEO (2025)	GEO signals diverge substantially from SEO signals in e-commerce contexts
Diagnosing Citation Failures	Asks WHY a document fails to be cited, rather than generic rewriting
Beyond Keywords (Content-Centric Agents)	End-to-end GSEO framework for post-keyword era
Kumar & Lakkaraju (2024)	Strategic text sequences can manipulate LLM recommendations (adversarial angle)

8. Implications for Bitsy

Build on proven strategies: Quotation addition (+41%), statistics (+32%), source citations (+27%), and fluency optimization (+28%) are the evidence-based GEO strategies. Content should be chunked into 40–60 word self-contained blocks. Freshness is the strongest signal (<14 days = 3x boost).

Avoid snake oil: Do not build features around keyword stuffing (proven harmful), llms.txt (zero evidence), or schema-for-frequency (disproven at scale). Do not promise universal strategies — effectiveness varies by domain and model. Include honest uncertainty ranges.

Key architectural insight: The 11% cross-platform domain overlap means per-model monitoring and optimization is essential. A single “GEO score” would be misleading. Track parametric vs. RAG separately — parametric is durable (18–36 months), RAG is volatile.