Blog 3 — Netflix: How We Handle 80M Concurrent Streams

C

Qubits of DPK

March 21, 2026

Core Case Studies
Core Concept: CDN architecture, Chaos engineering, Circuit breaker pattern
Why SDE-2 Critical: Video/streaming design is asked in 70% of product company interviews
Status: Draft notes ready

Quick Revision

  • Problem: Deliver video globally with low latency and no cascading failures.
  • Core pattern: CDN at the edge, adaptive bitrate streaming, circuit breakers.
  • Interview one-liner: Separate metadata APIs from video delivery and let the edge do the heavy lifting.

️ Architecture Overview

javascript
QUBITS OF DPK
1User clicks Play
234Netflix API (metadata: title, subtitles, recommendations)
567Open Connect CDN (actual video bytes)
8  ├── ISP-embedded server (closest to user)
9  ├── Regional PoP (fallback)
10  └── AWS Origin (last resort)

Core Concepts

Open Connect — Netflix's Own CDN

  • Netflix built its OWN CDN called Open Connect
  • Physical servers installed inside ISP data centers worldwide
  • Video is pre-positioned (cached) on these servers before you even press play
  • Result: 80M concurrent streams with minimal AWS egress cost

Adaptive Bitrate Streaming

javascript
QUBITS OF DPK
1Same video encoded at multiple qualities:
2  └── 4K (25 Mbps)
3  └── 1080p (8 Mbps)
4  └── 720p (4 Mbps)
5  └── 480p (1.5 Mbps)
6  └── 240p (0.5 Mbps)
7
8Player checks bandwidth every few seconds
9→ switches quality automatically
10→ no buffering even on slow connections

Circuit Breaker Pattern

javascript
QUBITS OF DPK
1Normal:    APIRecommendation Service → returns results
2Degraded:  Recommendation Service slow → circuit OPENS
3Fallback:  API → returns cached/popular recommendations instead
4Recovery:  After timeout → circuit HALF-OPENS → tests service → closes if healthy
  • Prevents one failing service from cascading to entire system
  • Netflix calls this Hystrix (now open source)

Chaos Engineering — Chaos Monkey

  • Netflix intentionally kills random servers in production
  • Philosophy: "If it can fail, it will fail. Better to find out on our terms."
  • Forces every service to be resilient by design
  • Evolved into full Simian Army (Chaos Gorilla kills entire availability zones)

Scale Achieved

5 Interview Questions This Blog Unlocks

Q1. Design Netflix / YouTube at scale

Answer: Separate metadata (API servers + DB) from video bytes (CDN). Use adaptive bitrate encoding. Pre-position popular content at edge. Use circuit breakers for resilience between microservices.

Q2. What is a circuit breaker and why is it important?

Answer: Prevents cascading failures. If Service B is slow, Service A's threads pile up waiting → Service A also dies. Circuit breaker detects failure threshold → opens circuit → returns fallback immediately → Service A stays healthy.

Q3. Why did Netflix build their own CDN instead of using Cloudflare/Akamai?

Answer: At Netflix's scale, third-party CDN costs are enormous. By embedding Open Connect servers inside ISPs, they move bits closer to users AND save hundreds of millions in egress fees. Custom CDN = cost control + performance.

Q4. What is adaptive bitrate streaming?

Answer: Video encoded at multiple quality levels. Player monitors available bandwidth every few seconds. Automatically switches to higher or lower quality chunk. User gets best possible quality without buffering.

Q5. What is Chaos Engineering and why does Netflix practice it?

Answer: Intentionally injecting failures in production to find weaknesses before they cause real outages. If every service is built assuming dependencies can fail, the system is resilient by default. "Practice failure so failure doesn't surprise you."

Key Engineering Lessons