Domain Understanding
An Email Campaign system allows marketing teams to create campaigns, build subscriber lists, send bulk emails to segments of their audience, and track engagement — opens, clicks, unsubscribes, and bounces. The critical challenge here is that the same email can be sent to thousands of subscribers, and every interaction with every sent email must be tracked individually. This creates an enormous events table that dominates the schema design.
The second challenge is subscriber segmentation — a subscriber can belong to multiple lists, a campaign can target multiple lists, and a subscriber can have different statuses across different lists (subscribed to the Newsletter list but unsubscribed from the Promotions list). These nuances fundamentally shape the junction tables.
The system is write-heavy for events (every open, every click writes a row) and read-heavy for reporting (marketers query open rates, click rates, and campaign performance constantly).
Entity Map
Full Schema with Comments
1CREATE TABLE organisations (
2 id BIGINT AUTO_INCREMENT PRIMARY KEY,
3 name VARCHAR(255) NOT NULL,
4 domain VARCHAR(255) UNIQUE,
5 -- The sending domain (e.g. zomato.com)
6 created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
7);
8
9CREATE TABLE users (
10 id BIGINT AUTO_INCREMENT PRIMARY KEY,
11 organisation_id BIGINT NOT NULL,
12 email VARCHAR(255) NOT NULL UNIQUE,
13 name VARCHAR(255) NOT NULL,
14 role ENUM('ADMIN','EDITOR','VIEWER') NOT NULL DEFAULT 'EDITOR',
15 created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
16 deleted_at DATETIME NULL,
17 FOREIGN KEY (organisation_id) REFERENCES organisations(id)
18);
19
20-- Subscribers are the people who receive emails
21-- email is the natural unique identifier here
22CREATE TABLE subscribers (
23 id BIGINT AUTO_INCREMENT PRIMARY KEY,
24 organisation_id BIGINT NOT NULL,
25 email VARCHAR(255) NOT NULL,
26 first_name VARCHAR(100),
27 last_name VARCHAR(100),
28 global_status ENUM('ACTIVE','UNSUBSCRIBED','BOUNCED','COMPLAINED') NOT NULL DEFAULT 'ACTIVE',
29 -- Global status = across ALL lists for this organisation
30 -- If BOUNCED or COMPLAINED, never send to this address again regardless of list status
31 custom_fields JSON NULL,
32 -- Flexible key-value attributes per subscriber: {"city": "Mumbai", "tier": "premium"}
33 -- JSON for variable attributes avoids an EAV (Entity-Attribute-Value) anti-pattern
34 created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
35 updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
36 UNIQUE (organisation_id, email),
37 -- Same email can be a subscriber in different organisations, but unique within one
38 FOREIGN KEY (organisation_id) REFERENCES organisations(id)
39);
40
41-- A list is a named segment of subscribers (e.g. "Newsletter", "Promotions", "VIP")
42CREATE TABLE lists (
43 id BIGINT AUTO_INCREMENT PRIMARY KEY,
44 organisation_id BIGINT NOT NULL,
45 name VARCHAR(255) NOT NULL,
46 description TEXT,
47 created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
48 FOREIGN KEY (organisation_id) REFERENCES organisations(id)
49);
50
51-- M:M junction between lists and subscribers
52-- Carries per-list subscription status — a subscriber can be active in one list
53-- but unsubscribed from another
54CREATE TABLE list_subscribers (
55 id BIGINT AUTO_INCREMENT PRIMARY KEY,
56 list_id BIGINT NOT NULL,
57 subscriber_id BIGINT NOT NULL,
58 status ENUM('SUBSCRIBED','UNSUBSCRIBED','PENDING') NOT NULL DEFAULT 'SUBSCRIBED',
59 -- PENDING: double opt-in confirmation not yet completed
60 subscribed_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
61 unsubscribed_at DATETIME NULL,
62 -- NULL while SUBSCRIBED. Set when status changes to UNSUBSCRIBED.
63 UNIQUE (list_id, subscriber_id),
64 -- A subscriber can only be in a list once
65 FOREIGN KEY (list_id) REFERENCES lists(id),
66 FOREIGN KEY (subscriber_id) REFERENCES subscribers(id)
67);
68
69-- A campaign is a bulk email send to one or more lists
70CREATE TABLE campaigns (
71 id BIGINT AUTO_INCREMENT PRIMARY KEY,
72 organisation_id BIGINT NOT NULL,
73 name VARCHAR(255) NOT NULL,
74 subject VARCHAR(500) NOT NULL,
75 -- Email subject line
76 from_name VARCHAR(255) NOT NULL,
77 from_email VARCHAR(255) NOT NULL,
78 -- Sending identity (can differ from organisation domain for sub-brands)
79 html_body LONGTEXT NOT NULL,
80 -- Full HTML email body. LONGTEXT supports up to 4GB.
81 plain_body TEXT,
82 -- Plain-text fallback for email clients that don't support HTML
83 status ENUM('DRAFT','SCHEDULED','SENDING','SENT','PAUSED','CANCELLED') NOT NULL DEFAULT 'DRAFT',
84 scheduled_at DATETIME NULL,
85 -- NULL for immediately sent campaigns
86 sent_at DATETIME NULL,
87 -- Set when sending completes
88 created_by BIGINT NOT NULL,
89 created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
90 updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
91 FOREIGN KEY (organisation_id) REFERENCES organisations(id),
92 FOREIGN KEY (created_by) REFERENCES users(id)
93);
94
95-- M:M: a campaign targets multiple lists; a list can be targeted by multiple campaigns
96CREATE TABLE campaign_lists (
97 campaign_id BIGINT NOT NULL,
98 list_id BIGINT NOT NULL,
99 PRIMARY KEY (campaign_id, list_id),
100 FOREIGN KEY (campaign_id) REFERENCES campaigns(id),
101 FOREIGN KEY (list_id) REFERENCES lists(id)
102);
103
104-- One row per email sent to one subscriber
105-- This is the core delivery record: did this specific person receive this campaign?
106CREATE TABLE sent_emails (
107 id BIGINT AUTO_INCREMENT PRIMARY KEY,
108 campaign_id BIGINT NOT NULL,
109 subscriber_id BIGINT NOT NULL,
110 status ENUM('QUEUED','SENT','DELIVERED','BOUNCED','FAILED') NOT NULL DEFAULT 'QUEUED',
111 message_id VARCHAR(255) UNIQUE,
112 -- Unique ID from the email delivery provider (SendGrid, SES, Mailgun)
113 -- Used to correlate incoming webhooks (delivery receipts, bounces) back to this row
114 sent_at DATETIME NULL,
115 delivered_at DATETIME NULL,
116 UNIQUE (campaign_id, subscriber_id),
117 -- A subscriber receives a campaign email exactly once
118 FOREIGN KEY (campaign_id) REFERENCES campaigns(id),
119 FOREIGN KEY (subscriber_id) REFERENCES subscribers(id)
120);
121
122-- The most write-heavy table: every open, click, bounce, unsubscribe = one row
123-- For a campaign sent to 1 million subscribers, this table could get
124-- 5-10 million rows from that single campaign if open rate is 50% and click rate is 10%
125CREATE TABLE email_events (
126 id BIGINT AUTO_INCREMENT PRIMARY KEY,
127 sent_email_id BIGINT NOT NULL,
128 event_type ENUM('OPEN','CLICK','BOUNCE','UNSUBSCRIBE','SPAM_COMPLAINT') NOT NULL,
129 occurred_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
130 ip_address VARCHAR(45) NULL,
131 -- IPv4 (15 chars) or IPv6 (39 chars) — VARCHAR(45) covers both
132 user_agent VARCHAR(500) NULL,
133 -- Browser/email client identifier for device analytics
134 click_url VARCHAR(2000) NULL,
135 -- NULL for non-CLICK events. The specific URL clicked for CLICK events.
136 FOREIGN KEY (sent_email_id) REFERENCES sent_emails(id)
137 -- No UNIQUE constraint here: the same subscriber can open an email multiple times
138 -- Every open is a separate event row
139);
140
141-- Indexes for the most critical query patterns
142CREATE INDEX idx_sent_emails_campaign ON sent_emails(campaign_id);
143-- "All sent emails for campaign X" — used to compute delivery stats
144CREATE INDEX idx_sent_emails_subscriber ON sent_emails(subscriber_id);
145-- "All campaigns sent to subscriber X" — used for subscriber activity view
146CREATE INDEX idx_email_events_sent_email ON email_events(sent_email_id);
147-- "All events for this specific sent email" — opens and clicks per delivery
148CREATE INDEX idx_email_events_type_time ON email_events(event_type, occurred_at);
149-- Used for time-series analytics: "how many opens per day for this campaign?"
150CREATE INDEX idx_subscribers_org_status ON subscribers(organisation_id, global_status);
151-- "All active subscribers for this organisation" — used before every campaign send
152CREATE INDEX idx_list_subscribers_list ON list_subscribers(list_id, status);
153-- "All subscribed members of list X" — used to build the recipient list
Key Design Decisions
Computing Campaign Analytics
Marketers need open rate, click rate, and bounce rate for every campaign. At a naive level, these require aggregating email_events on every dashboard load. For large campaigns, this is too slow. The production approach is a precomputed campaign_stats denormalized table, updated by the same background job that processes incoming webhook events.
Running this live on a 1M-row email_events table for every dashboard load is impractical. A precomputed stats table is the production solution, updated as each webhook event arrives.
Indexing Strategy
Common Traps to Avoid
- One status field on subscriber only — Subscribers need per-list status for selective unsubscribe, plus a global status for hard bounces/complaints that override all lists.
- Storing open count as a counter column — open_count INT on sent_emails creates race conditions under concurrent webhook events. Append-only events with COUNT queries (or precomputed stats) is safer and more flexible.
- No sent_emails table — Without it, you cannot tell which subscribers received a campaign, and you cannot attribute engagement events to specific deliveries.
- Forgetting message_id — Email providers identify deliveries by their own IDs in webhook payloads. Without message_id stored on sent_emails, you cannot correlate incoming webhooks to your database rows.
- TEXT instead of LONGTEXT for HTML body — Formatted email HTML regularly exceeds 65KB. Always use LONGTEXT.
Interview Q&A
Q: How do you model the fact that a subscriber can unsubscribe from one list but stay on another?
Two-level status. list_subscribers.status tracks per-list subscription state. subscribers.global_status tracks account-level hard blocks like bounces and spam complaints. Before sending any email, check global_status first, then list status. A BOUNCED or COMPLAINED global status overrides any list-level subscription.
Q: How do you compute open rate for a campaign?
Count DISTINCT sent_email_id in email_events WHERE event_type = 'OPEN' divided by total sent_emails for the campaign. Unique opens (per subscriber) not total opens (which would count multiple opens). In production, precompute this into a campaign_stats table updated by the webhook processing job.
Q: Why does email_events have no UNIQUE constraint?
Because a subscriber can genuinely open an email multiple times on different devices or days. Each open is a distinct engagement event worth recording for analytics and behavioural scoring. A UNIQUE constraint would silently drop real events.
Q: What is message_id and why is it on sent_emails?
Email delivery providers (SendGrid, SES) assign their own internal ID to every email they send. They include this ID in all subsequent webhook notifications ("this email was opened", "this email bounced"). message_id on sent_emails is the foreign key that maps incoming provider webhooks back to the correct row in your database.
Q: How does your schema scale for a campaign sent to 10 million subscribers?
sent_emails gets 10M rows. email_events gets 30-50M rows from that one campaign. Both tables need partitioning by campaign_id or sent_at date range. Analytics queries should run against a precomputed campaign_stats table, not scan email_events live. For very high scale, email_events moves to a columnar store like ClickHouse or Redshift.
ANKI Cards
Q: In an email campaign schema, why are there two status fields: global_status on subscribers and status on list_subscribers?
A: list_subscribers.status handles selective unsubscription ("remove me from Promotions but not Newsletter"). global_status handles hard blocks that override all lists — if an email address has hard-bounced or been marked as spam, you must never send to it again regardless of list membership.
Q: What is the role of sent_emails in an email campaign schema?
A: It is the delivery record — one row per (campaign, subscriber) pair, proving that a specific subscriber received a specific campaign. It anchors all engagement events in email_events and stores the message_id needed to correlate incoming provider webhooks.
Q: Why is email_events an append-only table with no UNIQUE constraint on open events?
A: A subscriber can open an email multiple times. Each open is a real, trackable engagement event. UNIQUE would silently drop re-opens. Append-only is correct for event streams and is also the most write-efficient table design.