Blog 18 — Amazon: Dynamo — Highly Available Key-Value Store

C

Qubits of DPK

March 21, 2026

Core Case Studies
Core Concept: Consistent hashing, Vector clocks, Sloppy quorum + hinted handoff, Eventual consistency, Merkle trees
Why SDE-2 Critical: The most foundational paper in distributed systems — Cassandra, DynamoDB, Riak all built on these ideas
Status: Draft notes ready

Quick Revision

  • Problem: Keep a key-value system available even through failures and partitions.
  • Core pattern: Consistent hashing, quorum reads/writes, and anti-entropy repair.
  • Interview one-liner: Dynamo trades perfect freshness for availability that the business can rely on.

️ The Core Problem Amazon Solved

javascript
QUBITS OF DPK
12004: Amazon's shopping cart must ALWAYS be available
2  Even if:
3Network partitions occur
4Servers go down
5Data centers have issues
6
7  A customer must ALWAYS be able to add items to cart
8  Cart = key-value store (userId → list of items)
9
10CAP Theorem:
11  You can't have Consistency + Availability + Partition tolerance
12  Amazon chose: Availability + Partition tolerance (AP system)
13  Sacrificed: Strong consistency (accepted eventual consistency)

Core Concepts

Consistent Hashing

javascript
QUBITS OF DPK
1Problem with naive sharding (modulo):
2  3 nodes, hash % 3
3  Add 4th node → ALL keys remap → massive redistribution
4
5Consistent hashing solution:
6  Hash ring: 0 to 2^32 (imagine a clock)
7  Nodes placed at random positions on ring
8  Key → hash → walk clockwise → first node = owner
9
10  Add new node:
11    Only keys between new node and its predecessor remapped
12    ~1/N keys move (N = number of nodes)
13    Minimal disruption
14
15Virtual nodes: Each physical node = multiple positions on ring
16  Ensures even data distribution even with heterogeneous nodes

Replication (N, R, W)

javascript
QUBITS OF DPK
1Dynamo parameters:
2  N = number of replicas (typically 3)
3  R = replicas to read from (quorum)
4  W = replicas to write to (quorum)
5
6Common config: N=3, R=2, W=2
7  Write: send to 3 nodes, wait for 2 confirmations → success
8  Read:  read from 2 nodes, return latest version
9
10  R + W > N → at least one overlap → reads always see latest write
11  Tunable: lower R or W = higher availability, lower consistency

Sloppy Quorum + Hinted Handoff

javascript
QUBITS OF DPK
1Normal quorum: write to the 3 designated nodes
2
3Node B is down:
4  Sloppy quorum: write to Node D instead (next available on ring)
5  Include hint: "this data belongs to Node B"
6
7Hinted handoff:
8  Node D holds B's data temporarily with hint
9  When Node B recovers → Node D forwards the hinted data
10  Node B catches up automatically
11
12Result: Writes succeed even when designated nodes are down
13High availability at the cost of temporary inconsistency

Vector Clocks (Conflict Detection)

javascript
QUBITS OF DPK
1Problem: Two clients update same cart simultaneously on different replicas
2
3Vector clock tracks causality:
4  Initial: cart = {item1}  clock = {}
5  Alice adds item2: cart = {item1, item2}  clock = {Alice:1}
6  Bob adds item3 (didn't see Alice's update):
7    cart = {item1, item3}  clock = {Bob:1}
8
9  Conflict detected: neither clock dominates the other
10  Resolution: show both versions to user (merge UI)
11Amazon shows "items from a previous session" on cart

Merkle Trees (Anti-Entropy)

javascript
QUBITS OF DPK
1Ensure replicas stay in sync:
2  Each node maintains a Merkle tree of its data
3  Hash of leaf = hash of key-value pair
4  Hash of parent = hash of children
5
6Sync check:
7  Compare root hashes of two replicas
8  If same → perfectly in sync → done
9  If different → traverse tree to find divergent subtree → sync only that part
10
11  Instead of comparing all keys (expensive)
12  Only O(log N) comparisons needed

Impact of This Paper

5 Interview Questions This Blog Unlocks

Q1. How does DynamoDB/Cassandra work under the hood?

Answer: Consistent hashing assigns key ranges to nodes. N replicas store each key. Writes sent to N nodes, return success after W confirmations. Reads from R nodes, return latest version. Eventual consistency — replicas sync via hinted handoff and Merkle tree anti-entropy. Vector clocks detect conflicts.

Q2. What is consistent hashing and why is it better than modulo hashing?

Answer: Consistent hashing places nodes on a ring. Keys map to the next clockwise node. Adding/removing a node only remaps ~1/N keys. Modulo hashing remaps ALL keys when N changes, causing massive data movement. At scale, consistent hashing enables live node additions with minimal disruption.

Q3. What is eventual consistency and when is it acceptable?

Answer: Data may be temporarily inconsistent across replicas but will converge to same state eventually. Acceptable for: shopping carts, social feeds, user preferences, analytics. Not acceptable for: financial transactions, inventory counts, authentication. Dynamo chose eventual consistency to maximize availability for Amazon's shopping cart.

Q4. What is the difference between N, R, W in Dynamo-style systems?

Answer: N = replication factor (copies). R = replicas to read from. W = replicas to write to. When R+W > N, there's guaranteed overlap between read and write sets — reads always see the latest write. Lower R or W → higher availability, weaker consistency. Tune based on whether you need read-heavy or write-heavy consistency.

Q5. How do Cassandra nodes detect and fix data divergence between replicas?

Answer: Merkle trees. Each replica builds a Merkle tree of its data (hash tree). Compare root hashes with peer nodes. If different, recursively find divergent subtrees. Only sync the differing portion — O(log N) comparisons instead of comparing all keys. This is called anti-entropy repair.

Key Engineering Lessons