Blog 16 — LinkedIn: Building Real-Time Search
C
Qubits of DPK
March 21, 2026
Core Case Studies
Core Concept: Inverted index, Ranking algorithms, Typeahead/autocomplete at scale, Elasticsearch internals
Why SDE-2 Critical: Search exists in every product — Swiggy restaurant search, Amazon product search, LinkedIn people search
Status: Draft notes ready
Quick Revision
- Problem: Search must feel instant even with huge corpora and constant updates.
- Core pattern: Inverted index, prefix search, and async indexing pipelines.
- Interview one-liner: Search is two systems: fast retrieval and smart ranking.
️ Architecture Overview
javascript
QUBITS OF DPK
Core Concepts
Inverted Index — The Heart of Search
javascript
QUBITS OF DPK
Typeahead / Autocomplete at Scale
javascript
QUBITS OF DPK
Ranking Algorithm
javascript
QUBITS OF DPK
Real-Time Index Updates
javascript
QUBITS OF DPK
Scale at LinkedIn
5 Interview Questions This Blog Unlocks
Q1. Design Google Search autocomplete / typeahead
Answer: Trie for small dataset. Elasticsearch prefix queries at scale. Cache top-N suggestions per prefix in Redis (TTL 1 hour). Pre-compute popular prefixes offline. Personalize by adding user's search history weight. Return top 5-10 suggestions ranked by global frequency + personal relevance.
Q2. What is an inverted index and how does search use it?
Answer: Maps each word to the list of documents containing it. Query lookup = find each term's posting list, intersect them, rank results. Enables O(1) per-term lookup vs O(N) full scan. Elasticsearch, Solr, and Lucene are all built on inverted indexes. Building index is expensive but lookup is extremely fast.
Q3. How would you design search for an e-commerce app like Amazon?
Answer: Inverted index for text (product name, description, brand). Faceted search for filters (price range, category, rating). Ranking: text relevance + purchase history + sponsored boost + rating. Real-time index updates via Kafka → Elasticsearch. Typeahead via prefix index. A/B test ranking algorithms continuously.
Q4. What is TF-IDF and why is it used for ranking?
Answer: TF (Term Frequency) = how often term appears in document. IDF (Inverse Document Frequency) = how rare the term is across all documents. TF-IDF = TF * IDF. "Software" appears everywhere → low IDF → low weight. "Kubernetes" is rare → high IDF → high weight. Documents with rare matching terms rank higher.
Q5. How do you keep search results fresh when data changes constantly?
Answer: Event-driven index updates. DB write → Kafka event → indexing consumer updates Elasticsearch. Accept eventual consistency for search (seconds of lag is fine). For critical updates (deleted content), use higher priority queue. For bulk updates, use offline batch reindexing during low-traffic hours.