Blog 8 — LinkedIn: Scaling to 5M QPS Using Bloom Filters
C
Qubits of DPK
March 21, 2026
Core Case Studies
Core Concept: Bloom filters, Probabilistic data structures, BitSet caching
Why SDE-2 Critical: DSA in production — shows you think beyond LeetCode
Status: Draft notes ready
Quick Revision
- Problem: Membership checks at millions of QPS would crush the database.
- Core pattern: Bloom filters reject most negatives in memory.
- Interview one-liner: A tiny false-positive rate is often a great trade for massive database savings.
️ The Problem LinkedIn Solved
javascript
QUBITS OF DPK
Core Concepts
What is a Bloom Filter?
- A space-efficient probabilistic data structure
- Answers the question: "Have I seen this element before?"
- Two possible answers:
- Never has false negatives (never says "not in set" when it IS)
How It Works
javascript
QUBITS OF DPK
Space Efficiency
javascript
QUBITS OF DPK
LinkedIn's BitSet Approach
javascript
QUBITS OF DPK
Impact
5 Interview Questions This Blog Unlocks
Q1. How would you check if a username already exists across 100M users in O(1)?
Answer: Bloom filter. Hash the username through K hash functions, check if all corresponding bits are set. If any bit is 0 → username definitely available (no DB call needed). If all bits are 1 → possibly taken, verify with DB. 95%+ of "new username" checks never touch DB.
Q2. What is a false positive in a Bloom filter and why does it matter?
Answer: False positive = filter says "element might be in set" when it's actually NOT. In LinkedIn's case, a false positive means checking DB for a connection that doesn't exist. Acceptable — just one extra DB call. What's never acceptable: false negative (saying "not connected" when they ARE).
Q3. Can you delete an element from a Bloom filter?
Answer: No — standard Bloom filters don't support deletion. Bit could be shared by multiple elements; clearing it removes other elements too. Solution: Counting Bloom Filter — each bit becomes a counter. Decrement on delete. More memory, but deletion-safe.
Q4. Where else are Bloom filters used in production systems?
Answer: Chrome safe browsing (check if URL is malicious without sending all URLs to server), Cassandra (check if key exists before disk read), CDN (check if content is cached), spam filters, Bitcoin nodes (check transaction exists), HBase (avoid disk reads for missing keys).
Q5. How do you choose the size of a Bloom filter?
Answer: Two parameters: N (expected elements) and desired false positive rate P. Formula: m = -Nln(P) / (ln(2))^2 bits. Optimal hash functions K = m/N ln(2). At 1% false positive rate for 1B elements → ~1.2 GB, 7 hash functions.