Blog 5 — Discord: How We Store Billions of Messages

Qubits of DPK

March 21, 2026

Core Case Studies

Source: https://discord.com/blog/how-discord-stores-billions-of-messages

Core Concept: MongoDB → Cassandra → ScyllaDB migration, Time-series data storage

Why SDE-2 Critical: Storage decisions at scale — why you pick which DB for which workload

Status: Draft notes ready

Quick Revision

Problem: Store massive chat history with fast append and recent-message reads.
Core pattern: Time-bucketed wide-column storage with Cassandra/ScyllaDB.
Interview one-liner: Pick a database around the access pattern, not the brand name.

️ The Journey: 3 Database Migrations

javascript

QUBITS OF DPK

12015: MongoDB
2  └── Hits scaling limits → too slow for message history
3         │
42017: Apache Cassandra
5  └── Works but operational complexity grows
6         │
72023: ScyllaDB (Cassandra-compatible, written in C++)
8  └── Current solution — 4x better performance

Core Concepts

Why MongoDB Failed

javascript

QUBITS OF DPK

1MongoDB stores messages as documents:
2  {
3    channel_id: "123",
4    messages: [ msg1, msg2, msg3... ]
5  }
6
7Problem:
8  - Hot channels get huge documents
9  - Random access to old messages = full document scan
10  - No native time-series support
11  - Uneven data distribution across shards

Why Cassandra Was Chosen

Designed for time-series data (perfect for chat messages)
Wide column model: partition = channel, rows = messages ordered by time
Linear horizontal scaling — add nodes, capacity grows linearly
Tunable consistency — Discord chose eventual consistency (fine for chat)

Cassandra Data Model for Messages

javascript

QUBITS OF DPK

1Partition Key: (channel_id, bucket)
2Clustering Key: message_id (snowflake timestamp-based)
3
4Example:
5  Partition: (channel_123, 2024-01)
6    Row: msg_id=1710000001 → "Hey!"
7    Row: msg_id=1710000002 → "How are you?"
8    Row: msg_id=1710000003 → "Great thanks!"
9
10Fetch last 50 messages:
11  SELECT * FROM messages
12  WHERE channel_id = 123 AND bucket = '2024-01'
13  ORDER BY message_id DESC LIMIT 50
14  → Single partition read → extremely fast

The Bucket Problem

A channel with 10 years of messages in one partition = too large
Discord splits by time bucket (e.g., monthly)
Old bucket = cold storage, recent bucket = hot

Why ScyllaDB Over Cassandra

javascript

QUBITS OF DPK

1Cassandra: Written in Java → JVM GC pauses → latency spikes
2ScyllaDB:  Written in C++ → no GC → predictable low latency
3
4Result at Discord:
5  - 4x better performance
6  - Same Cassandra query language (CQL)
7  - No application code changes needed
8  - Fewer nodes needed → cost reduction

Scale Achieved

5 Interview Questions This Blog Unlocks

Q1. Design a chat system like WhatsApp / Slack

Answer: Use Cassandra/ScyllaDB with partition key = (channel_id, time_bucket), clustering key = message_id. Recent messages read from hot partition. Historical messages from cold buckets. WebSockets for real-time delivery. Kafka for fan-out to group members.

Q2. Why is Cassandra good for time-series data?

Answer: Cassandra's wide column model naturally maps to time-series. Partition = entity (channel/user/device), rows = time-ordered events. Append-only writes are extremely fast. Range queries by time are efficient. No JOINs needed.

Q3. What is the difference between relational and wide-column databases?

Answer: Relational: fixed schema, rows have same columns, optimized for JOINs and transactions. Wide-column: flexible schema, rows in same partition can have different columns, optimized for massive write throughput and range queries by partition + clustering key.

Q4. How would you handle message deletion in a Cassandra-based chat system?

Answer: Cassandra uses tombstones for deletion (soft delete marker). Hard deletes are expensive. Better approach: mark message as deleted in application layer, filter on read. Compact tombstones periodically. Never rely on frequent hard deletes in Cassandra.

Q5. Why did Discord migrate from Cassandra to ScyllaDB with zero downtime?

Answer: ScyllaDB is wire-compatible with Cassandra (same CQL protocol). Migration strategy: run both in parallel → double-write to both → gradually shift reads to ScyllaDB → verify data consistency → decommission Cassandra. Application code unchanged.

Quick Revision

️ The Journey: 3 Database Migrations

Core Concepts

Why MongoDB Failed

Why Cassandra Was Chosen

Cassandra Data Model for Messages

The Bucket Problem

Why ScyllaDB Over Cassandra

Scale Achieved

5 Interview Questions This Blog Unlocks

Q1. Design a chat system like WhatsApp / Slack

Q2. Why is Cassandra good for time-series data?

Q3. What is the difference between relational and wide-column databases?

Q4. How would you handle message deletion in a Cassandra-based chat system?

Q5. Why did Discord migrate from Cassandra to ScyllaDB with zero downtime?

Key Engineering Lessons