Blog 18 — Amazon: Dynamo — Highly Available Key-Value Store

Qubits of DPK

March 21, 2026

Core Case Studies

Source: https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

Core Concept: Consistent hashing, Vector clocks, Sloppy quorum + hinted handoff, Eventual consistency, Merkle trees

Why SDE-2 Critical: The most foundational paper in distributed systems — Cassandra, DynamoDB, Riak all built on these ideas

Status: Draft notes ready

Quick Revision

Problem: Keep a key-value system available even through failures and partitions.
Core pattern: Consistent hashing, quorum reads/writes, and anti-entropy repair.
Interview one-liner: Dynamo trades perfect freshness for availability that the business can rely on.

️ The Core Problem Amazon Solved

javascript

QUBITS OF DPK

12004: Amazon's shopping cart must ALWAYS be available
2  Even if:
3  ✔ Network partitions occur
4  ✔ Servers go down
5  ✔ Data centers have issues
6
7  A customer must ALWAYS be able to add items to cart
8  Cart = key-value store (userId → list of items)
9
10CAP Theorem:
11  You can't have Consistency + Availability + Partition tolerance
12  Amazon chose: Availability + Partition tolerance (AP system)
13  Sacrificed: Strong consistency (accepted eventual consistency)

Core Concepts

Consistent Hashing

javascript

QUBITS OF DPK

1Problem with naive sharding (modulo):
2  3 nodes, hash % 3
3  Add 4th node → ALL keys remap → massive redistribution
4
5Consistent hashing solution:
6  Hash ring: 0 to 2^32 (imagine a clock)
7  Nodes placed at random positions on ring
8  Key → hash → walk clockwise → first node = owner
9
10  Add new node:
11    Only keys between new node and its predecessor remapped
12    ~1/N keys move (N = number of nodes)
13    Minimal disruption
14
15Virtual nodes: Each physical node = multiple positions on ring
16  Ensures even data distribution even with heterogeneous nodes

Replication (N, R, W)

javascript

QUBITS OF DPK

1Dynamo parameters:
2  N = number of replicas (typically 3)
3  R = replicas to read from (quorum)
4  W = replicas to write to (quorum)
5
6Common config: N=3, R=2, W=2
7  Write: send to 3 nodes, wait for 2 confirmations → success
8  Read:  read from 2 nodes, return latest version
9
10  R + W > N → at least one overlap → reads always see latest write
11  Tunable: lower R or W = higher availability, lower consistency

Sloppy Quorum + Hinted Handoff

javascript

QUBITS OF DPK

1Normal quorum: write to the 3 designated nodes
2
3Node B is down:
4  Sloppy quorum: write to Node D instead (next available on ring)
5  Include hint: "this data belongs to Node B"
6
7Hinted handoff:
8  Node D holds B's data temporarily with hint
9  When Node B recovers → Node D forwards the hinted data
10  Node B catches up automatically
11
12Result: Writes succeed even when designated nodes are down
13  → High availability at the cost of temporary inconsistency

Vector Clocks (Conflict Detection)

javascript

QUBITS OF DPK

1Problem: Two clients update same cart simultaneously on different replicas
2
3Vector clock tracks causality:
4  Initial: cart = {item1}  clock = {}
5  Alice adds item2: cart = {item1, item2}  clock = {Alice:1}
6  Bob adds item3 (didn't see Alice's update):
7    cart = {item1, item3}  clock = {Bob:1}
8
9  Conflict detected: neither clock dominates the other
10  Resolution: show both versions to user (merge UI)
11    → Amazon shows "items from a previous session" on cart

Merkle Trees (Anti-Entropy)

javascript

QUBITS OF DPK

1Ensure replicas stay in sync:
2  Each node maintains a Merkle tree of its data
3  Hash of leaf = hash of key-value pair
4  Hash of parent = hash of children
5
6Sync check:
7  Compare root hashes of two replicas
8  If same → perfectly in sync → done
9  If different → traverse tree to find divergent subtree → sync only that part
10
11  Instead of comparing all keys (expensive)
12  Only O(log N) comparisons needed

Impact of This Paper

5 Interview Questions This Blog Unlocks

Q1. How does DynamoDB/Cassandra work under the hood?

Answer: Consistent hashing assigns key ranges to nodes. N replicas store each key. Writes sent to N nodes, return success after W confirmations. Reads from R nodes, return latest version. Eventual consistency — replicas sync via hinted handoff and Merkle tree anti-entropy. Vector clocks detect conflicts.

Q2. What is consistent hashing and why is it better than modulo hashing?

Answer: Consistent hashing places nodes on a ring. Keys map to the next clockwise node. Adding/removing a node only remaps ~1/N keys. Modulo hashing remaps ALL keys when N changes, causing massive data movement. At scale, consistent hashing enables live node additions with minimal disruption.

Q3. What is eventual consistency and when is it acceptable?

Answer: Data may be temporarily inconsistent across replicas but will converge to same state eventually. Acceptable for: shopping carts, social feeds, user preferences, analytics. Not acceptable for: financial transactions, inventory counts, authentication. Dynamo chose eventual consistency to maximize availability for Amazon's shopping cart.

Q4. What is the difference between N, R, W in Dynamo-style systems?

Answer: N = replication factor (copies). R = replicas to read from. W = replicas to write to. When R+W > N, there's guaranteed overlap between read and write sets — reads always see the latest write. Lower R or W → higher availability, weaker consistency. Tune based on whether you need read-heavy or write-heavy consistency.

Q5. How do Cassandra nodes detect and fix data divergence between replicas?

Answer: Merkle trees. Each replica builds a Merkle tree of its data (hash tree). Compare root hashes with peer nodes. If different, recursively find divergent subtrees. Only sync the differing portion — O(log N) comparisons instead of comparing all keys. This is called anti-entropy repair.

Quick Revision

️ The Core Problem Amazon Solved

Core Concepts

Consistent Hashing

Replication (N, R, W)

Sloppy Quorum + Hinted Handoff

Vector Clocks (Conflict Detection)

Merkle Trees (Anti-Entropy)

Impact of This Paper

5 Interview Questions This Blog Unlocks

Q1. How does DynamoDB/Cassandra work under the hood?

Q2. What is consistent hashing and why is it better than modulo hashing?

Q3. What is eventual consistency and when is it acceptable?

Q4. What is the difference between N, R, W in Dynamo-style systems?

Q5. How do Cassandra nodes detect and fix data divergence between replicas?

Key Engineering Lessons