Blog 10 — Meta: Designing the Social Graph

C

Qubits of DPK

March 21, 2026

Core Case Studies
Core Concept: Graph DB vs relational DB, TAO distributed cache, Eventual consistency in social networks
Why SDE-2 Critical: Social feed design is the most commonly asked system design question
Status: Draft notes ready

Quick Revision

  • Problem: Social graph reads are simple but happen at enormous scale.
  • Core pattern: MySQL shards as source of truth with TAO-style caching on top.
  • Interview one-liner: Optimize for actual query shape; most social graph reads are 1-hop, not deep traversal.

️ Architecture Overview

javascript
QUBITS OF DPK
1Social Graph:
2  Nodes = Users, Pages, Groups, Events, Posts
3  Edges = Friends, Likes, Follows, Members, Attended
4
5  Alice ── friends ── Bob
6  Alice ── likes ── Post #123
7  Bob ── member ── Group #456
8
9TAO (The Associations and Objects) sits on top:
10  └── Distributed cache layer
11  └── All graph reads go here first
12  └── Backed by MySQL shards

Core Concepts

Why Not a Traditional Graph Database?

javascript
QUBITS OF DPK
1Neo4j and graph DBs are great for complex traversals:
2  Find friends-of-friends-of-friends in 5 hops
3
4Facebook's queries are simpler:
5  "Get all friends of Alice"1 hop only
6  "Get all posts liked by Alice"1 hop only
7
8Conclusion: MySQL with smart indexing + TAO cache
9  is FASTER and more scalable than graph DB
10  for 1-hop social graph queries at 3 billion user scale

TAO — The Associations and Objects Cache

javascript
QUBITS OF DPK
1TAO models everything as:
2  Objects: { id, type, data }
3    e.g., User{id:1, name:"Alice"}, Post{id:123, text:"Hello"}
4
5  Associations: { id1, assoc_type, id2, time, data }
6    e.g., (Alice, FRIEND, Bob), (Alice, LIKED, Post#123)
7
8TAO Cache:
9  assoc_get(Alice, FRIEND)[Bob, Charlie, Dave...]
10  obj_get(Post, 123){text: "Hello", author: Alice}
11
12Backed by MySQL shards, cached in TAO:
13  99%+ reads served from TAO cache
14  MySQL only for cache misses and writes

Social Graph Sharding

javascript
QUBITS OF DPK
1Challenge: Can't shard by userId evenly
2Celebrity users have 100M edges
3Regular users have 500 edges
4Hotspot problem
5
6Meta's approach:
7  Shard by association type + source user
8  Celebrity edges get their own dedicated shards
9  TAO handles routing transparently

Eventual Consistency in Social Graphs

javascript
QUBITS OF DPK
1Alice likes Bob's post:
2  Write goes to primary MySQL shard
3  TAO cache updated
4  Replicas lag by 10-100ms
5
6Charlie loads Bob's post from replica:
7  Might not see Alice's like yet → that's OK!
8
9Rule: Eventual consistency is acceptable for:
10Likes, comments, shares
11Friend suggestions
12Feed ordering
13
14NOT acceptable for:
15Financial transactions
16Security events (login, password change)

Scale Achieved

5 Interview Questions This Blog Unlocks

Q1. Design Facebook friends / Instagram followers

Answer: Store relationships as edges in a table (user_id, friend_id, created_at). For follower counts, maintain a counter table. Cache hot social graphs in TAO-like system (Redis). Shard by user_id. Celebrities get separate shards. Use eventual consistency for social data.

Q2. Why did Facebook use MySQL instead of a graph database?

Answer: Facebook's queries are mostly 1-hop ("get all friends of user X"). Graph databases excel at multi-hop traversals. For 1-hop at 3B user scale, MySQL + smart indexing + TAO cache is faster, more operationally mature, and easier to scale than Neo4j.

Q3. What is TAO and what problem does it solve?

Answer: TAO is Facebook's distributed cache for the social graph. Without it, every social graph query would hit MySQL directly — impossible at billions of QPS. TAO caches objects and associations in a read-through manner, serving 99%+ reads from memory with MySQL as backing store.

Q4. How would you design a "People You May Know" feature?

Answer: Friends-of-friends algorithm: for user A, get all friends, then for each friend get their friends, count overlap. Store result in precomputed table (updated async). Use Bloom filters to exclude existing connections. Cache results with 1-hour TTL. This is a batch job, not real-time.

Q5. How does Facebook handle celebrities with 100M followers differently?

Answer: Regular users: edges stored in standard shards. Celebrities: edges get dedicated shards + aggressive edge caching. Fan-out on read (compute feed at read time) instead of fan-out on write (pre-compute for 100M followers on every post). Hybrid approach based on follower count threshold.

Key Engineering Lessons