Blog 20 — Instagram: Scaling to 1 Billion Users
C
Qubits of DPK
March 21, 2026
Core Case Studies
Core Concept: PostgreSQL sharding strategy, Photo storage with S3 + CDN, Denormalization for feed performance, Cassandra for activity feeds
Why SDE-2 Critical: Most commonly asked system design question in Indian product company interviews
Status: Draft notes ready
Quick Revision
- Problem: Serve uploads, feeds, and notifications for a media-heavy social app.
- Core pattern: S3 plus CDN for media, sharded PostgreSQL for metadata, Cassandra for activity streams.
- Interview one-liner: Instagram scales by matching storage technology to each access pattern instead of forcing one database everywhere.
️ Architecture Overview
javascript
QUBITS OF DPK
Core Concepts
Photo Storage Architecture
javascript
QUBITS OF DPK
PostgreSQL Sharding Strategy
javascript
QUBITS OF DPK
Denormalization for Feed Performance
javascript
QUBITS OF DPK
Feed Architecture
javascript
QUBITS OF DPK
Cassandra for Activity Feed (Notifications)
javascript
QUBITS OF DPK
Scale Achieved
5 Interview Questions This Blog Unlocks
Q1. Design Instagram
Answer: Photo storage: S3 + CDN (multi-resolution). Metadata: PostgreSQL sharded by user_id with Snowflake IDs. Feed: pull model with Redis cache (TTL 10 min). Notifications: Cassandra (time-series). Likes/comments: denormalized counters on photo row. Follows: adjacency list in sharded PostgreSQL.
Q2. How would you design a photo upload system that handles millions of uploads per day?
Answer: Client uploads to API server → store original in S3 → publish event to Kafka → background job resizes to multiple resolutions → store all versions in S3 → update CDN. Async processing prevents blocking the upload API. Return photo ID immediately, versions ready within seconds.
Q3. What is a Snowflake ID and why is it better than auto-increment for distributed systems?
Answer: 64-bit ID encoding timestamp + shard ID + sequence. Globally unique without central coordination (no single ID generator needed). Naturally time-ordered (can sort by ID = sort by creation time). Shard ID encoded in ID — routing without looking up metadata. Auto-increment requires central lock, doesn't scale.
Q4. How does Instagram count likes without a SELECT COUNT(*) query?
Answer: Denormalization. Store like_count directly as a column on the photo row. Increment on like, decrement on unlike (atomic UPDATE). Feed query reads count directly from photo row — O(1). Trade-off: under extreme concurrency, count might be slightly off (race condition). Acceptable for likes. Not acceptable for inventory.
Q5. How does Instagram decide between PostgreSQL, Cassandra, and Redis for different data types?
Answer: PostgreSQL: relational data needing JOINs and transactions (users, photos, follows, comments). Cassandra: time-series data at extreme scale with simple access patterns (notifications, activity feeds). Redis: ephemeral fast data needing sub-millisecond access (sessions, feed cache, counters, rate limits). Match data structure to access pattern.