Blog 8 — LinkedIn: Scaling to 5M QPS Using Bloom Filters

C

Qubits of DPK

March 21, 2026

Core Case Studies
Core Concept: Bloom filters, Probabilistic data structures, BitSet caching
Why SDE-2 Critical: DSA in production — shows you think beyond LeetCode
Status: Draft notes ready

Quick Revision

  • Problem: Membership checks at millions of QPS would crush the database.
  • Core pattern: Bloom filters reject most negatives in memory.
  • Interview one-liner: A tiny false-positive rate is often a great trade for massive database savings.

️ The Problem LinkedIn Solved

javascript
QUBITS OF DPK
1LinkedIn's "People You May Know" feature:
2  - Must NOT suggest someone user already connected with
3  - 900M users, each with hundreds of connections
4  - 5 million QPS checking "is X already connected to Y?"
5
6Naive approach:
7  SELECT * FROM connections WHERE user_id = X AND connected_to = Y
8  → 5M DB queries/sec → DB dies 💥
9
10Bloom Filter approach:
11  Check in-memory filter first
12If filter says NO → definitely not connected (skip DB)
13If filter says YES → might be connected (verify with DB)
1495%+ queries never touch DB

Core Concepts

What is a Bloom Filter?

  • A space-efficient probabilistic data structure
  • Answers the question: "Have I seen this element before?"
  • Two possible answers:
  • Never has false negatives (never says "not in set" when it IS)

How It Works

javascript
QUBITS OF DPK
1Bloom filter = bit array of N bits + K hash functions
2
3Add "alice-bob" connection:
4  hash1("alice-bob") = 3set bit[3] = 1
5  hash2("alice-bob") = 7set bit[7] = 1
6  hash3("alice-bob") = 12set bit[12] = 1
7
8Check "alice-charlie":
9  hash1("alice-charlie") = 3  → bit[3] = 110  hash2("alice-charlie") = 5  → bit[5] = 011Bit 5 is 0DEFINITELY NOT connected ✅
12
13Check "alice-bob" again:
14  hash1 = 3, hash2 = 7, hash3 = 1215All bits setPOSSIBLY connected (verify with DB)

Space Efficiency

javascript
QUBITS OF DPK
1Storing 1 billion connections:
2  HashSet approach:  ~32 GB RAM
3  Bloom Filter:      ~1.2 GB RAM  (96% less!)
4
5Trade-off: ~1% false positive rate
61% of queries make unnecessary DB calls
7Worth it for 96% memory savings

LinkedIn's BitSet Approach

javascript
QUBITS OF DPK
1For each user, maintain a BitSet of connection IDs:
2  User Alice (id=100):
3    BitSet: [0,0,1,0,1,1,0...]
4              ^ bit 3 set = connected to user 3
5              ^ bit 5 set = connected to user 5
6
7Check if Alice connected to Bob (id=3):
8  BitSet[3] == 1YES (O(1) lookup)
9
10For 900M users with avg 500 connections:
11  500 bits per user = ~56 bytes per user
12  vs storing actual IDs = ~2000 bytes per user

Impact

5 Interview Questions This Blog Unlocks

Q1. How would you check if a username already exists across 100M users in O(1)?

Answer: Bloom filter. Hash the username through K hash functions, check if all corresponding bits are set. If any bit is 0 → username definitely available (no DB call needed). If all bits are 1 → possibly taken, verify with DB. 95%+ of "new username" checks never touch DB.

Q2. What is a false positive in a Bloom filter and why does it matter?

Answer: False positive = filter says "element might be in set" when it's actually NOT. In LinkedIn's case, a false positive means checking DB for a connection that doesn't exist. Acceptable — just one extra DB call. What's never acceptable: false negative (saying "not connected" when they ARE).

Q3. Can you delete an element from a Bloom filter?

Answer: No — standard Bloom filters don't support deletion. Bit could be shared by multiple elements; clearing it removes other elements too. Solution: Counting Bloom Filter — each bit becomes a counter. Decrement on delete. More memory, but deletion-safe.

Q4. Where else are Bloom filters used in production systems?

Answer: Chrome safe browsing (check if URL is malicious without sending all URLs to server), Cassandra (check if key exists before disk read), CDN (check if content is cached), spam filters, Bitcoin nodes (check transaction exists), HBase (avoid disk reads for missing keys).

Q5. How do you choose the size of a Bloom filter?

Answer: Two parameters: N (expected elements) and desired false positive rate P. Formula: m = -Nln(P) / (ln(2))^2 bits. Optimal hash functions K = m/N ln(2). At 1% false positive rate for 1B elements → ~1.2 GB, 7 hash functions.

Key Engineering Lessons