Blog 8 — LinkedIn: Scaling to 5M QPS Using Bloom Filters

Qubits of DPK

March 21, 2026

Core Case Studies

Source: https://engineering.linkedin.com

Core Concept: Bloom filters, Probabilistic data structures, BitSet caching

Why SDE-2 Critical: DSA in production — shows you think beyond LeetCode

Status: Draft notes ready

Quick Revision

Problem: Membership checks at millions of QPS would crush the database.
Core pattern: Bloom filters reject most negatives in memory.
Interview one-liner: A tiny false-positive rate is often a great trade for massive database savings.

️ The Problem LinkedIn Solved

javascript

QUBITS OF DPK

1LinkedIn's "People You May Know" feature:
2  - Must NOT suggest someone user already connected with
3  - 900M users, each with hundreds of connections
4  - 5 million QPS checking "is X already connected to Y?"
5
6Naive approach:
7  SELECT * FROM connections WHERE user_id = X AND connected_to = Y
8  → 5M DB queries/sec → DB dies 💥
9
10Bloom Filter approach:
11  Check in-memory filter first
12  → If filter says NO → definitely not connected (skip DB)
13  → If filter says YES → might be connected (verify with DB)
14  → 95%+ queries never touch DB ✅

Core Concepts

What is a Bloom Filter?

A space-efficient probabilistic data structure
Answers the question: "Have I seen this element before?"
Two possible answers:
Never has false negatives (never says "not in set" when it IS)

How It Works

javascript

QUBITS OF DPK

1Bloom filter = bit array of N bits + K hash functions
2
3Add "alice-bob" connection:
4  hash1("alice-bob") = 3  → set bit[3] = 1
5  hash2("alice-bob") = 7  → set bit[7] = 1
6  hash3("alice-bob") = 12 → set bit[12] = 1
7
8Check "alice-charlie":
9  hash1("alice-charlie") = 3  → bit[3] = 1 ✓
10  hash2("alice-charlie") = 5  → bit[5] = 0 ✗
11  → Bit 5 is 0 → DEFINITELY NOT connected ✅
12
13Check "alice-bob" again:
14  hash1 = 3 ✓, hash2 = 7 ✓, hash3 = 12 ✓
15  → All bits set → POSSIBLY connected (verify with DB)

Space Efficiency

javascript

QUBITS OF DPK

1Storing 1 billion connections:
2  HashSet approach:  ~32 GB RAM
3  Bloom Filter:      ~1.2 GB RAM  (96% less!)
4
5Trade-off: ~1% false positive rate
6  → 1% of queries make unnecessary DB calls
7  → Worth it for 96% memory savings

LinkedIn's BitSet Approach

javascript

QUBITS OF DPK

1For each user, maintain a BitSet of connection IDs:
2  User Alice (id=100):
3    BitSet: [0,0,1,0,1,1,0...]
4              ^ bit 3 set = connected to user 3
5              ^ bit 5 set = connected to user 5
6
7Check if Alice connected to Bob (id=3):
8  BitSet[3] == 1 → YES (O(1) lookup)
9
10For 900M users with avg 500 connections:
11  500 bits per user = ~56 bytes per user
12  vs storing actual IDs = ~2000 bytes per user

Impact

5 Interview Questions This Blog Unlocks

Q1. How would you check if a username already exists across 100M users in O(1)?

Answer: Bloom filter. Hash the username through K hash functions, check if all corresponding bits are set. If any bit is 0 → username definitely available (no DB call needed). If all bits are 1 → possibly taken, verify with DB. 95%+ of "new username" checks never touch DB.

Q2. What is a false positive in a Bloom filter and why does it matter?

Answer: False positive = filter says "element might be in set" when it's actually NOT. In LinkedIn's case, a false positive means checking DB for a connection that doesn't exist. Acceptable — just one extra DB call. What's never acceptable: false negative (saying "not connected" when they ARE).

Q3. Can you delete an element from a Bloom filter?

Answer: No — standard Bloom filters don't support deletion. Bit could be shared by multiple elements; clearing it removes other elements too. Solution: Counting Bloom Filter — each bit becomes a counter. Decrement on delete. More memory, but deletion-safe.

Q4. Where else are Bloom filters used in production systems?

Answer: Chrome safe browsing (check if URL is malicious without sending all URLs to server), Cassandra (check if key exists before disk read), CDN (check if content is cached), spam filters, Bitcoin nodes (check transaction exists), HBase (avoid disk reads for missing keys).

Q5. How do you choose the size of a Bloom filter?

Answer: Two parameters: N (expected elements) and desired false positive rate P. Formula: m = -Nln(P) / (ln(2))^2 bits. Optimal hash functions K = m/N ln(2). At 1% false positive rate for 1B elements → ~1.2 GB, 7 hash functions.

Quick Revision

️ The Problem LinkedIn Solved

Core Concepts

What is a Bloom Filter?

How It Works

Space Efficiency

LinkedIn's BitSet Approach

Impact

5 Interview Questions This Blog Unlocks

Q1. How would you check if a username already exists across 100M users in O(1)?

Q2. What is a false positive in a Bloom filter and why does it matter?

Q3. Can you delete an element from a Bloom filter?

Q4. Where else are Bloom filters used in production systems?

Q5. How do you choose the size of a Bloom filter?

Key Engineering Lessons