Domain Understanding
Netflix is a content streaming platform with a catalogue of movies and series. Unlike Rotten Tomatoes which aggregates external reviews, Netflix owns its content, manages subscriptions, and delivers personalised experiences. The three critical schema challenges are subscription and billing management, the continue-watching state (resuming exactly where you left off), and personalised content recommendation data (watch history, ratings).
The read-to-write ratio is extreme. A user's homepage is assembled from dozens of queries against watch history, ratings, and catalogue data. Writes happen when a user watches something, rates something, or adds to their list. The schema must support both fast catalogue reads and fast watch state writes.
Entity Map
Full Schema with Comments
1CREATE TABLE subscription_plans (
2 id BIGINT AUTO_INCREMENT PRIMARY KEY,
3 name VARCHAR(100) NOT NULL UNIQUE,
4 -- 'Mobile', 'Basic', 'Standard', 'Premium'
5 price_monthly DECIMAL(8,2) NOT NULL,
6 -- DECIMAL: exact financial arithmetic
7 max_screens TINYINT NOT NULL,
8 -- How many simultaneous streams this plan allows
9 max_quality ENUM('SD','HD','FULL_HD','UHD_4K') NOT NULL,
10 is_active BOOLEAN NOT NULL DEFAULT TRUE
11 -- Plans can be retired without deleting historical subscriptions
12);
13
14CREATE TABLE users (
15 id BIGINT AUTO_INCREMENT PRIMARY KEY,
16 email VARCHAR(255) NOT NULL UNIQUE,
17 password_hash VARCHAR(255) NOT NULL,
18 -- Never store plaintext passwords — store bcrypt/argon2 hash
19 created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
20 deleted_at DATETIME NULL
21);
22
23-- Subscriptions are an append-only billing history
24-- Never update a subscription row — create a new one on plan change
25CREATE TABLE subscriptions (
26 id BIGINT AUTO_INCREMENT PRIMARY KEY,
27 user_id BIGINT NOT NULL,
28 plan_id BIGINT NOT NULL,
29 status ENUM('ACTIVE','CANCELLED','EXPIRED','PAUSED') NOT NULL DEFAULT 'ACTIVE',
30 started_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
31 expires_at DATETIME NOT NULL,
32 -- When this billing cycle ends
33 cancelled_at DATETIME NULL,
34 -- NULL if not cancelled. NOT NULL if user cancelled.
35 FOREIGN KEY (user_id) REFERENCES users(id),
36 FOREIGN KEY (plan_id) REFERENCES subscription_plans(id)
37);
38
39-- A Netflix account has multiple viewer profiles (e.g. "Mum", "Kids", "Dad")
40-- Watch history, ratings, and lists are per-profile, not per-account
41CREATE TABLE profiles (
42 id BIGINT AUTO_INCREMENT PRIMARY KEY,
43 user_id BIGINT NOT NULL,
44 name VARCHAR(100) NOT NULL,
45 -- 'Deepak', 'Kids', 'Mum'
46 avatar_url VARCHAR(500),
47 is_kids BOOLEAN NOT NULL DEFAULT FALSE,
48 -- Kids profiles have content filtering applied
49 language CHAR(5) NOT NULL DEFAULT 'en',
50 -- ISO language code for UI and content preferences
51 created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
52 FOREIGN KEY (user_id) REFERENCES users(id)
53);
54
55CREATE TABLE genres (
56 id BIGINT AUTO_INCREMENT PRIMARY KEY,
57 name VARCHAR(100) NOT NULL UNIQUE
58 -- 'Thriller', 'Drama', 'Documentary', 'Anime'
59);
60
61-- content covers both movies and series in one table
62-- The type column differentiates them
63CREATE TABLE content (
64 id BIGINT AUTO_INCREMENT PRIMARY KEY,
65 title VARCHAR(255) NOT NULL,
66 type ENUM('MOVIE','SERIES') NOT NULL,
67 -- If MOVIE: watch_history links here directly
68 -- If SERIES: watch_history links to episodes, not content
69 description TEXT,
70 release_year YEAR,
71 -- YEAR type: stores just the year (1901-2155)
72 maturity_rating ENUM('U','U/A 7+','U/A 13+','U/A 16+','A') NOT NULL,
73 -- CBFC ratings for India
74 duration_minutes INT NULL,
75 -- NULL for SERIES (duration is per episode). NOT NULL for MOVIE.
76 language CHAR(5) NOT NULL,
77 trailer_url VARCHAR(500),
78 thumbnail_url VARCHAR(500),
79 created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
80 deleted_at DATETIME NULL
81 -- Soft delete: removing content from catalogue without losing watch history
82);
83
84CREATE TABLE content_genres (
85 content_id BIGINT NOT NULL,
86 genre_id BIGINT NOT NULL,
87 PRIMARY KEY (content_id, genre_id),
88 FOREIGN KEY (content_id) REFERENCES content(id),
89 FOREIGN KEY (genre_id) REFERENCES genres(id)
90);
91
92-- Only exists for SERIES type content
93-- Movies do not have episodes
94CREATE TABLE episodes (
95 id BIGINT AUTO_INCREMENT PRIMARY KEY,
96 content_id BIGINT NOT NULL,
97 -- FK to the parent series in content table
98 season_number TINYINT NOT NULL,
99 episode_number TINYINT NOT NULL,
100 title VARCHAR(255) NOT NULL,
101 description TEXT,
102 duration_minutes INT NOT NULL,
103 thumbnail_url VARCHAR(500),
104 UNIQUE (content_id, season_number, episode_number),
105 -- A series cannot have two S2E3 episodes
106 FOREIGN KEY (content_id) REFERENCES content(id)
107);
108
109CREATE TABLE persons (
110 id BIGINT AUTO_INCREMENT PRIMARY KEY,
111 full_name VARCHAR(255) NOT NULL,
112 photo_url VARCHAR(500)
113);
114
115-- M:M: content can have many cast/crew, persons can appear in many content items
116CREATE TABLE content_cast (
117 id BIGINT AUTO_INCREMENT PRIMARY KEY,
118 content_id BIGINT NOT NULL,
119 person_id BIGINT NOT NULL,
120 role_type ENUM('DIRECTOR','ACTOR','WRITER','PRODUCER') NOT NULL,
121 character_name VARCHAR(255) NULL,
122 -- Character name for actors, NULL for directors/writers
123 FOREIGN KEY (content_id) REFERENCES content(id),
124 FOREIGN KEY (person_id) REFERENCES persons(id)
125);
126
127-- The most write-heavy table on the platform
128-- Every play/pause/seek event is written here
129CREATE TABLE watch_history (
130 id BIGINT AUTO_INCREMENT PRIMARY KEY,
131 profile_id BIGINT NOT NULL,
132 content_id BIGINT NULL,
133 -- NOT NULL for MOVIE, NULL for SERIES
134 episode_id BIGINT NULL,
135 -- NOT NULL for SERIES episodes, NULL for MOVIE
136 -- Exactly one of content_id or episode_id should be set
137 watched_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
138 progress_seconds INT NOT NULL DEFAULT 0,
139 -- How many seconds into the content the user has watched
140 -- This is the "continue watching" resume point
141 total_seconds INT NOT NULL,
142 -- Total duration of the content at the time of watching
143 is_completed BOOLEAN NOT NULL DEFAULT FALSE,
144 -- TRUE when progress_seconds >= ~95% of total_seconds
145 FOREIGN KEY (profile_id) REFERENCES profiles(id),
146 FOREIGN KEY (content_id) REFERENCES content(id),
147 FOREIGN KEY (episode_id) REFERENCES episodes(id)
148);
149
150-- User's saved list ("My List" button on Netflix)
151CREATE TABLE my_list (
152 profile_id BIGINT NOT NULL,
153 content_id BIGINT NOT NULL,
154 added_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
155 PRIMARY KEY (profile_id, content_id),
156 FOREIGN KEY (profile_id) REFERENCES profiles(id),
157 FOREIGN KEY (content_id) REFERENCES content(id)
158);
159
160-- Thumbs up / thumbs down ratings
161CREATE TABLE ratings (
162 profile_id BIGINT NOT NULL,
163 content_id BIGINT NOT NULL,
164 rating ENUM('THUMBS_UP','THUMBS_DOWN','TWO_THUMBS_UP') NOT NULL,
165 -- Netflix added Two Thumbs Up in 2022 for exceptional content
166 rated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
167 PRIMARY KEY (profile_id, content_id),
168 -- One rating per profile per content item
169 FOREIGN KEY (profile_id) REFERENCES profiles(id),
170 FOREIGN KEY (content_id) REFERENCES content(id)
171);
172
173-- Indexes for hot query paths
174CREATE INDEX idx_watch_history_profile ON watch_history(profile_id);
175-- "Continue watching" and watch history page
176CREATE INDEX idx_watch_history_content ON watch_history(content_id);
177CREATE INDEX idx_episodes_content ON episodes(content_id);
178-- "All episodes of series X" — used on every series detail page
179CREATE INDEX idx_subscriptions_user ON subscriptions(user_id);
180CREATE INDEX idx_content_type ON content(type);
181-- Filter movies vs series
182CREATE INDEX idx_profiles_user ON profiles(user_id);
Key Design Decisions
The Continue Watching Problem
The "Continue Watching" row on Netflix's homepage requires answering: For this profile, give me the most recently watched in-progress content, in reverse chronological order.
This query runs for every user's homepage load. The index on watch_history(profile_id) is critical. Without it, this query full-scans a table that may have billions of rows.
Indexing Strategy
Common Traps to Avoid
- One table for users, no profiles — Watch history and preferences are per-viewer, not per-account. A family account has multiple viewers. Always model profiles separately.
- Storing watch progress as a boolean watched — You lose the resume point. Store progress_seconds to enable continue watching at the exact frame.
- Separate movies and series tables — They share 90% of their attributes. One content table with a type column is cleaner and avoids duplicating genre/cast relationships.
- Hard-deleting content — Users' watch history references content that may be removed from the catalogue. Soft delete preserves referential integrity and lets users see their history.
- No subscription history — Overwriting a subscription row on plan change destroys billing history needed for disputes, refunds, and tax compliance.
Interview Q&A
Q: How does your schema support the "Continue Watching" feature?
The watch_history table stores progress_seconds (exact resume point) and is_completed. The homepage queries watch_history for the profile, filters WHERE is_completed = FALSE, orders by watched_at DESC. The index on profile_id makes this fast even with billions of rows.
Q: Why are users and profiles modeled separately?
A Netflix account is a billing entity. A profile is a viewer identity. One account can have up to 5 profiles with independent watch history, ratings, language preferences, and kids mode. Merging them would mean sharing watch history across family members, destroying personalisation.
Q: Why is content a single table for movies and series?
They share title, description, genre, cast, maturity rating, and language. Separating them into two tables doubles the genre and cast relationship tables. The type ENUM differentiates them. Episodes only exist for SERIES and are modeled in a child table.
Q: How do you track whether a user is currently subscribed?
Query subscriptions WHERE user_id = :id AND status = 'ACTIVE' AND expires_at > NOW(). Subscriptions are never updated on cancellation — cancelled_at is set and the row expires naturally, giving a full billing history.
Q: How does your schema scale to Netflix's volume?
watch_history is the hot table. At Netflix scale it would be partitioned by profile_id hash and the active data would live in a columnar store (like Apache Iceberg on S3) for analytics, while recent watch events go into Cassandra for low-latency writes. The relational schema handles the core transactional model.
ANKI Cards
Q: Why does Netflix model profiles separately from user accounts?
A: Profiles are viewer identities within a shared account. Watch history, ratings, and preferences are per-viewer. Multiple family members share one account but must not share watch data. Profile separation is what enables personalised recommendations.
Q: What does progress_seconds in watch_history enable?
A: It stores the exact playback position in seconds, enabling "Continue Watching" to resume at the precise frame. A boolean watched column would only record completion, not the resume point.
Q: In a content table combining movies and series, how do you model series-specific data?
A: The type ENUM distinguishes movies from series. Series-specific data (episodes) lives in a child episodes table linked to content by content_id. Movies have no episodes. duration_minutes is set on movies and NULL on series (duration is per-episode instead).