Blog 1 — OpenAI: Scaling PostgreSQL to 800M ChatGPT Users
C
Qubits of DPK
March 21, 2026
Core Case Studies
Published: Early 2026
Core Lesson: A single PostgreSQL primary can serve 800M users with the right engineering discipline — no sharding needed.
Quick Revision
- Problem: Scale a write-heavy relational system without jumping straight to sharding.
- Core pattern: Read replicas, PgBouncer, caching, and workload isolation.
- Interview one-liner: Exhaust app-layer and query-level optimizations before reaching for sharding.
️ Architecture Overview
javascript
QUBITS OF DPK
3 Pillars of Their Strategy
Pillar 1 — Offload Reads Aggressively
- All SELECT queries → replicas (NOT primary)
- Some reads stay on primary only when they are part of write transactions
- Tradeoff accepted: Replica lag = slightly stale data. Worth it — reduces primary load by 10x
Pillar 2 — Reduce Write Pressure
- Migrated write-heavy workloads to Azure CosmosDB (sharded)
- Fixed application bugs causing redundant writes
- Introduced lazy writes — buffer in memory, batch INSERT later
- Enforced rate limits during backfills
Lazy Write Example:
javascript
QUBITS OF DPK
Pillar 3 — High Availability for Primary
- Primary runs in HA mode with a hot standby
- Hot standby = continuously synced replica, ready to promote instantly
- Result: 99.999% availability (five-nines)
Connection Pooling — PgBouncer (The Hidden Hero)
The Problem:
javascript
QUBITS OF DPK
The Solution — PgBouncer as proxy:
javascript
QUBITS OF DPK
Result: Connection time dropped from 50ms → 5ms (10x improvement)
3 PgBouncer Modes (interviewers love asking this):
OpenAI uses transaction pooling — most efficient for stateless API servers.
Query Optimization
- Long-running transactions block garbage collection → enforced timeouts
- ORM-generated queries caused 12-table JOINs → replaced with simpler raw SQL
- Timeouts enforced:
javascript
QUBITS OF DPK
Thundering Herd Problem + Solution
The Problem:
javascript
QUBITS OF DPK
Solution 1 — Cache Locking (OpenAI's approach):
javascript
QUBITS OF DPK
Solution 2 — Request Collapsing:
javascript
QUBITS OF DPK
Solution 3 — Cache Expiry Jitter:
javascript
QUBITS OF DPK
Cascading Replication (Future-Proofing)
Problem with 50 direct replicas:
javascript
QUBITS OF DPK
Cascading Replication Solution:
javascript
QUBITS OF DPK
Tradeoff: Slightly higher replica lag (10ms → 15-20ms). Acceptable for read-heavy, eventually consistent workloads.
️ Workload Isolation
Final Results
5 Interview Questions This Blog Unlocks
Q1. Design the database layer for a system like ChatGPT
Answer framework:
- #Redis cache layer → 80-90% requests never hit DB
- #PgBouncer connection pooling → 50ms → 5ms connection time
- #Single primary for all writes + ~50 read replicas for reads
- #Workload isolation → write-heavy to CosmosDB
- #Hot standby for HA → five-nines availability
"Optimize at the application layer before touching the database architecture. Caching + connection pooling + read replicas can take you to 800M users without sharding."
Q2. What is a Thundering Herd problem and how do you solve it?
Definition: When a cached item expires and thousands of simultaneous requests all miss the cache and hit the DB at once → DB collapses.
3 Solutions:
- #Cache Locking — First request gets a lock, queries DB, fills cache. All others wait for lock.
- #Request Collapsing — 1000 requests deduplicated into 1 DB query, result returned to all.
- #Cache Expiry Jitter — Stagger expiry times randomly to avoid simultaneous mass expiry.
Q3. Why would you choose NOT to shard a database?
Problems sharding creates:
- Cross-shard queries become complex (JOIN across shards)
- Distributed transactions need 2-phase commit
- Rebalancing when one shard gets hot
- Every schema change multiplies by N shards
- Hundreds of application endpoints need modification
OpenAI's checklist before sharding:
javascript
QUBITS OF DPK
"Sharding is a last resort, not a first resort. Every app-layer optimization buys months of runway."
Q4. What is connection pooling and why does it matter at scale?
Problem: 10,000 app pods × 1 connection = 10,000 connections. PostgreSQL max ≈ 5,000. Crash.
Solution: PgBouncer sits between app and DB as a proxy. App pods connect to PgBouncer. PgBouncer maintains a controlled pool of ~200 real DB connections. DB stays healthy.
Result: 50ms → 5ms connection time (10x improvement)
Best mode for APIs: Transaction pooling — connection returned to pool after each transaction.
Q5. What is cascading replication and when would you use it?
Problem: At 50+ replicas, primary must stream WAL to each one — crushing primary CPU.
Solution: Intermediate replicas relay WAL downstream. Primary only talks to 2-3 intermediate nodes. They fan out to the rest.
Use when: 20+ replicas or multi-region expansion.
Tradeoff: Slightly higher replica lag (~5-10ms extra). Acceptable for eventually consistent read workloads.