SQL 11 : Schema Design I

D

Qubits of DPK

April 10, 2026

Core DBMS

Concept 1 — What is Schema Design?

Simple Definition

Schema design is the blueprint of how your data is stored in a database.
Before writing a single table, a professional engineer asks:
  • What data do I need to store?
  • How does this data relate to other data?
  • How will this data be queried?
  • How will this data grow over time?

The Problem It Solves

Without schema design, databases become:
  • Inconsistent (same data stored in 5 different formats)
  • Slow (no thought given to query patterns)
  • Impossible to maintain (changing one thing breaks everything)

Real-World Analogy

Think of schema design like architecting a building before construction.
You don't start laying bricks randomly. You draw the blueprint first — where walls go, where doors go, how rooms connect. The schema is that blueprint for your data.

What Professionals Actually Do

sql
QUBITS OF DPK
11. Understand the business domain
22. Identify what "things" (entities) need to be stored
33. Identify how these things relate to each other
44. Choose keys, constraints, and data types
55. Think about query patterns BEFORE finalizing tables
66. Iterateschema is never perfect on the first try

Key Principle

Schema design is not about tables. It's about modeling reality accurately, then optimizing for how the application reads and writes that reality.

Interview Q

"What is the first thing you do before designing a database schema?"
Answer: Understand the business domain and query patterns. A schema designed without knowing how data will be queried will almost always be wrong.

Concept 2 — Schema Design Process

The 5-Step Process

sql
QUBITS OF DPK
1Step 1 → Identify Entities          (What "things" exist in the system?)
2Step 2 → Identify Relationships     (How do things connect to each other?)
3Step 3 → Choose Keys                (How do we uniquely identify each row?)
4Step 4Apply Constraints          (What rules must the data obey?)
5Step 5 → Review Query Patterns      (Will our schema serve the actual queries?)

Real Example — Zomato

sql
QUBITS OF DPK
1Step 1 — Entities:
2  Customer, Restaurant, Order, Menu Item, Delivery Partner
3
4Step 2 — Relationships:
5  Customer places → Order
6  Order contains → Menu Items
7  Restaurant owns → Menu Items
8  Delivery Partner delivers → Order
9
10Step 3Keys:
11  Every table gets a surrogate PK (AUTO_INCREMENT id)
12  Business uniqueness enforced with UNIQUE constraints
13
14Step 4 — Constraints:
15  Orders must reference valid Customer and Restaurant
16  Price cannot be negative
17  Status can only be: PLACED, CONFIRMED, DELIVERED, CANCELLED
18
19Step 5 — Query Patterns:
20  "Fetch all orders by customer"index on customer_id in orders table
21  "Fetch menu for restaurant"index on restaurant_id in menu_items

Key Principle

Always think about Step 5 (query patterns) while doing Step 1–4. The best normalized schema is useless if it can't answer your queries efficiently.

Interview Q

"Walk me through how you would approach designing a schema for an e-commerce system from scratch."
Answer: Start with entities (User, Product, Order, OrderItem, Payment), map their relationships, choose surrogate PKs, add FK constraints, then validate that the schema can answer the 5 most common queries efficiently.

Concept 3 — Identifying Entities

What is an Entity?

An entity is a real-world "thing" that needs its own table because:
  • It has its own identity (can be uniquely identified)
  • It has its own attributes (has meaningful columns)
  • It has its own lifecycle (can be created, updated, deleted independently)

The User vs Customer Distinction

This is a classic interview trap.
Wrong design (merged):
sql
QUBITS OF DPK
1CREATE TABLE users (
2  id INT PRIMARY KEY,
3  email VARCHAR(255),
4  password_hash VARCHAR(255),
5  shipping_address VARCHAR(500),   -- ❌ not every user is a customer
6  loyalty_points INT               -- ❌ not every user has points
7);
Correct design (separated):
sql
QUBITS OF DPK
1CREATE TABLE users (
2  id INT AUTO_INCREMENT PRIMARY KEY,
3  email VARCHAR(255) NOT NULL UNIQUE,
4  password_hash VARCHAR(255) NOT NULL,
5  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
6);
7
8CREATE TABLE customers (
9  id INT AUTO_INCREMENT PRIMARY KEY,
10  user_id INT NOT NULL UNIQUE,        -- 1:1 link to users
11  shipping_address VARCHAR(500),
12  loyalty_points INT DEFAULT 0,
13  FOREIGN KEY (user_id) REFERENCES users(id)
14);
Why? A user becomes a customer only after their first purchase. Keeping them separate lets you:
  • Onboard users without requiring shipping info
  • Track "registered but never purchased" users separately
  • Evolve customer data without touching auth logic

Role-Based Entity Split

When one entity has fundamentally different behaviors depending on role, split it.
Example — Booking System (MakeMyTrip):
sql
QUBITS OF DPK
1-- Wrong: one messy table
2CREATE TABLE persons (
3  id INT PRIMARY KEY,
4  name VARCHAR(100),
5  passenger_pnr VARCHAR(20),       -- only passengers have this
6  pilot_license VARCHAR(20),       -- only pilots have this
7  airline_employee_id VARCHAR(20)  -- only staff have this
8);
9
10-- Correct: role-based split
11CREATE TABLE users (id, name, email, role ENUM('PASSENGER', 'PILOT', 'STAFF'));
12CREATE TABLE passengers (id, user_id, pnr, passport_number);
13CREATE TABLE pilots (id, user_id, license_number, certifications);

Key Principle

When in doubt, split. Merging two logically different entities into one table is one of the most common and most painful schema mistakes to fix later.

Interview Q

"In a food delivery app, should you store customers and delivery partners in the same table or separate tables? Why?"
Answer: Separate tables. They share very little — customers have addresses and order history, delivery partners have vehicle info, live location, and earnings. Merging them creates sparse columns and breaks Single Responsibility. They also have different lifecycles, different access patterns, and different compliance requirements.

Concept 4 — One-to-Many Relationships

The Rule

The Foreign Key always lives on the MANY side.

Why?

Because the MANY side is the one that "belongs to" the ONE side.
sql
QUBITS OF DPK
1One Restaurant → Many Menu Items
2∴ menu_items table holds restaurant_id (the FK)
If you put it the other way (array of menu_item_ids on the restaurant), you violate 1NF and create an unmaintainable mess.

Schema Example — Zomato

sql
QUBITS OF DPK
1CREATE TABLE restaurants (
2  id INT AUTO_INCREMENT PRIMARY KEY,
3  name VARCHAR(255) NOT NULL,
4  city VARCHAR(100),
5  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
6);
7
8CREATE TABLE menu_items (
9  id INT AUTO_INCREMENT PRIMARY KEY,
10  restaurant_id INT NOT NULL,        -- FK on the MANY side
11  name VARCHAR(255) NOT NULL,
12  price DECIMAL(10, 2) NOT NULL,
13  is_available BOOLEAN DEFAULT TRUE,
14  FOREIGN KEY (restaurant_id) REFERENCES restaurants(id) ON DELETE CASCADE
15);

Visual

sql
QUBITS OF DPK
1restaurants                    menu_items
2-----------                    ----------
3id = 1  "Dominos"  ─────────►  id=1, restaurant_id=1, "Margherita"
4                   ─────────►  id=2, restaurant_id=1, "Farmhouse"
5                   ─────────►  id=3, restaurant_id=1, "Garlic Bread"
6
7id = 2  "KFC"      ─────────►  id=4, restaurant_id=2, "Popcorn Chicken"

Query Pattern

sql
QUBITS OF DPK
1-- Fetch all menu items for a restaurant
2SELECT * FROM menu_items WHERE restaurant_id = 1;
3-- Always index restaurant_id for this to be fast

ON DELETE Options

Interview Q

"In a schema where one User can have many Orders, where does the foreign key go? What happens to orders if the user is deleted?"
Answer: FK user_id goes on the orders table (the MANY side). For deletion: use RESTRICT or soft-delete, not CASCADE — you never want to cascade-delete financial records. Keep the order; anonymize the user data if needed for compliance.

Concept 5 — Composite Key Trap

The Trap

New engineers often use composite PKs like (user_id, course_id) thinking "this combination is unique, so it's a natural PK."
This causes problems:
sql
QUBITS OF DPK
1-- Looks fine at first
2CREATE TABLE enrollments (
3  user_id INT,
4  course_id INT,
5  enrolled_at DATETIME,
6  PRIMARY KEY (user_id, course_id)   -- composite PK
7);
Problem 1 — Business rule changes:
What if a user can enroll in the same course twice (re-enroll after expiry)?
Now your PK is wrong. You need to migrate data.
Problem 2 — FK references become painful:
Any table that wants to reference an enrollment needs BOTH columns as a FK.
Problem 3 — Query performance:
Composite PKs create clustered index on both columns. Querying by just
course_id won't use the index efficiently.

The Correct Pattern

sql
QUBITS OF DPK
1CREATE TABLE enrollments (
2  id INT AUTO_INCREMENT PRIMARY KEY,        -- surrogate PK
3  user_id INT NOT NULL,
4  course_id INT NOT NULL,
5  enrolled_at DATETIME DEFAULT CURRENT_TIMESTAMP,
6  status ENUM('ACTIVE', 'COMPLETED', 'EXPIRED') DEFAULT 'ACTIVE',
7  UNIQUE KEY uq_user_course (user_id, course_id),  -- business rule as UNIQUE
8  FOREIGN KEY (user_id) REFERENCES users(id),
9  FOREIGN KEY (course_id) REFERENCES courses(id)
10);
Why this is better:
  • id is the stable, immutable identity of a row
  • Business uniqueness is enforced by UNIQUE, not the PK
  • If business rules change (allow re-enrollment), just drop the UNIQUE constraint — PK stays untouched
  • Any other table can reference enrollments with just enrollment_id

The Rule

Use a surrogate PK (AUTO_INCREMENT id) for the row's identity. Use a UNIQUE constraint to enforce business uniqueness.

When Composite PKs Are Acceptable

In pure junction tables where:
  • The combination is universally unique
  • No other table will ever reference individual rows
  • Business rules are truly fixed
Even then, most production engineers add a surrogate PK for consistency.

Interview Q

"A table tracks which students are enrolled in which courses. Should you use a composite PK of (student_id, course_id) or a surrogate key? Defend your choice."
Answer: Surrogate PK + UNIQUE constraint on (student_id, course_id). The composite PK bakes a business assumption into the structural identity of the row. When the business rule changes — and it always does — the structural identity shouldn't have to change with it.

Concept 6 — Many-to-Many + Junction Tables

The Problem

A many-to-many relationship cannot be stored in either parent table.
sql
QUBITS OF DPK
1Students ←──────────────────→ Courses
2One student takes many courses.
3One course has many students.
You can't store course_ids as an array in students (violates 1NF).
You can't store
student_ids as an array in courses (violates 1NF).
Solution: A junction table (also called bridge table, associative table, or linking table).

Schema

sql
QUBITS OF DPK
1CREATE TABLE students (
2  id INT AUTO_INCREMENT PRIMARY KEY,
3  name VARCHAR(100) NOT NULL,
4  email VARCHAR(255) UNIQUE
5);
6
7CREATE TABLE courses (
8  id INT AUTO_INCREMENT PRIMARY KEY,
9  title VARCHAR(255) NOT NULL,
10  instructor VARCHAR(100)
11);
12
13-- Junction table
14CREATE TABLE enrollments (
15  id INT AUTO_INCREMENT PRIMARY KEY,
16  student_id INT NOT NULL,
17  course_id INT NOT NULL,
18  enrolled_at DATETIME DEFAULT CURRENT_TIMESTAMP,
19  grade CHAR(2),                              -- attribute OF the relationship
20  UNIQUE KEY uq_enrollment (student_id, course_id),
21  FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE,
22  FOREIGN KEY (course_id) REFERENCES courses(id) ON DELETE CASCADE
23);

Visual

sql
QUBITS OF DPK
1students          enrollments              courses
2--------          -----------              -------
3id=1 "Deepak" ──► student_id=1, course_id=1 ◄── id=1 "MySQL"
4              ──► student_id=1, course_id=2 ◄── id=2 "Java"
5id=2 "Arjun"  ──► student_id=2, course_id=1

Key Insight — Attributes of the Relationship

The junction table can hold attributes that belong to the relationship, not to either entity.
  • enrolled_at — when did THIS student enroll in THIS course?
  • grade — what grade did THIS student get in THIS course?
  • status — is this enrollment active or expired?
These don't belong on students (different per course) or courses (different per student). They belong on the junction row.

Queries

sql
QUBITS OF DPK
1-- All courses for a student
2SELECT c.title FROM courses c
3JOIN enrollments e ON e.course_id = c.id
4WHERE e.student_id = 1;
5
6-- All students in a course
7SELECT s.name FROM students s
8JOIN enrollments e ON e.student_id = s.id
9WHERE e.course_id = 1;

Interview Q

"Design a schema for a tagging system where posts can have many tags and tags can belong to many posts. What extra fields might the junction table need?"
Answer: Junction table post_tags(post_id, tag_id, tagged_at, tagged_by_user_id). The extra fields capture audit data — who tagged it and when. These are attributes of the tagging action, not of the post or the tag.

Concept 7 — First Normal Form (1NF)

The Rule

Every column must contain atomic (indivisible) values. No repeating groups. No arrays.

What Violates 1NF?

sql
QUBITS OF DPK
1-- ❌ Violates 1NF: multiple values in one column
2CREATE TABLE orders (
3  id INT PRIMARY KEY,
4  customer_name VARCHAR(100),
5  items VARCHAR(500)    -- "Pizza, Burger, Fries" ← NOT atomic
6);
7
8-- ❌ Violates 1NF: repeating columns
9CREATE TABLE orders (
10  id INT PRIMARY KEY,
11  item_1 VARCHAR(100),
12  item_2 VARCHAR(100),
13  item_3 VARCHAR(100)   -- What if order has 4 items?
14);

Why This Breaks Things

sql
QUBITS OF DPK
1-- How do you query: "Find all orders that contain Pizza"?
2SELECT * FROM orders WHERE items LIKE '%Pizza%';
3-- ❌ Full table scan. Can't index properly. Breaks if item name changes.

The Fix

sql
QUBITS OF DPK
1-- ✅ 1NF compliant: one value per row
2CREATE TABLE orders (
3  id INT AUTO_INCREMENT PRIMARY KEY,
4  customer_id INT NOT NULL,
5  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
6);
7
8CREATE TABLE order_items (
9  id INT AUTO_INCREMENT PRIMARY KEY,
10  order_id INT NOT NULL,
11  item_name VARCHAR(100) NOT NULL,
12  quantity INT NOT NULL,
13  price DECIMAL(10,2) NOT NULL,
14  FOREIGN KEY (order_id) REFERENCES orders(id)
15);

The Test for 1NF

Ask yourself about every column:
  1. #
    Could this column hold more than one value? → Violates 1NF
  2. #
    Do I have item_1, item_2, item_3 columns? → Violates 1NF
  3. #
    Can I directly index and query this column? → Passes 1NF if yes

Interview Q

"Your colleague stores phone numbers as a comma-separated string in a single column. What's wrong with this and how do you fix it?"
Answer: Violates 1NF — the column isn't atomic. You can't index it, can't query individual numbers efficiently, and can't enforce data integrity. Fix: create a user_phone_numbers table with (id, user_id, phone_number, type ENUM('mobile','home','work')). Each phone number gets its own row.

Concept 8 — Second Normal Form (2NF)

Prerequisite

2NF only applies when you have a composite primary key.
If your table has a single-column PK, it's automatically 2NF if it's 1NF.

The Rule

Every non-key column must depend on the WHOLE composite primary key, not just part of it.
A partial dependency = a non-key column depends on only one part of the composite PK.

The Violation

sql
QUBITS OF DPK
1-- Composite PK: (student_id, course_id)
2CREATE TABLE enrollments (
3  student_id INT,
4  course_id INT,
5  student_name VARCHAR(100),    -- ❌ depends only on student_id
6  course_title VARCHAR(100),    -- ❌ depends only on course_id
7  grade CHAR(2),                -- ✅ depends on BOTH (which student in which course)
8  PRIMARY KEY (student_id, course_id)
9);
student_name only needs student_id to be determined. It doesn't need course_id. That's a partial dependency — a 2NF violation.

Why This Hurts

  • Update anomaly: If a student changes their name, you must update it in EVERY enrollment row.
  • Redundancy: Student name stored repeatedly for every course they're enrolled in.
  • Inconsistency: What if one enrollment row has the old name and another has the new one?

The Fix

sql
QUBITS OF DPK
1-- Move attributes to their own table where they fully depend on that table's PK
2CREATE TABLE students (
3  id INT PRIMARY KEY,
4  name VARCHAR(100)    -- depends entirely on student.id ✅
5);
6
7CREATE TABLE courses (
8  id INT PRIMARY KEY,
9  title VARCHAR(100)   -- depends entirely on course.id ✅
10);
11
12CREATE TABLE enrollments (
13  student_id INT,
14  course_id INT,
15  grade CHAR(2),       -- depends on BOTH student_id AND course_id ✅
16  PRIMARY KEY (student_id, course_id)
17);

Memory Aid

Partial dependency = attribute that could live in a different, simpler table. Move it there.

Interview Q

"Explain what a partial dependency is and why it violates 2NF. Give an example."
Answer: A partial dependency is when a non-key column depends on only part of a composite primary key. Example: In (order_id, product_id, product_name)product_name depends only on product_id, not the full composite key. This creates update anomalies: change the product name and you must update every order row containing it. Fix: move product_name to a products table.

Concept 9 — Third Normal Form (3NF)

The Rule

No non-key column should depend on another non-key column. Every non-key column must depend directly on the primary key — nothing else.
A transitive dependency = Column A → Column B → Column C (C depends on B, B depends on A, but C doesn't directly depend on A's PK).

The Violation

sql
QUBITS OF DPK
1CREATE TABLE orders (
2  id INT PRIMARY KEY,
3  customer_id INT,
4  customer_city VARCHAR(100),     -- depends on customer_id, NOT on order.id ❌
5  customer_pincode VARCHAR(10),   -- depends on customer_id, NOT on order.id ❌
6  order_total DECIMAL(10,2)       -- depends directly on order.id ✅
7);
The chain: order.id → customer_id → customer_city
customer_city doesn't directly describe the order. It describes the customer. It's in the wrong table.

Why This Hurts

  • Customer moves to a new city → must update every order row
  • Inconsistency risk: two orders can disagree on the same customer's city
  • City data is duplicated across all orders for the same customer

The Fix

sql
QUBITS OF DPK
1CREATE TABLE customers (
2  id INT PRIMARY KEY,
3  name VARCHAR(100),
4  city VARCHAR(100),        -- directly describes the customer ✅
5  pincode VARCHAR(10)
6);
7
8CREATE TABLE orders (
9  id INT PRIMARY KEY,
10  customer_id INT NOT NULL,  -- FK to customers
11  order_total DECIMAL(10,2),
12  FOREIGN KEY (customer_id) REFERENCES customers(id)
13);
Now customer_city lives in exactly one place. Update it once, reflected everywhere.

The Test for 3NF

For every non-key column, ask:
"Does this column describe the entity this table is about, or does it describe something else?"
If it describes something else → it belongs in a different table.

Interview Q

"An orders table has columns: order_id, customer_id, customer_email, order_amount. Is this in 3NF? What's wrong and how do you fix it?"
Answer: Not in 3NF. customer_email has a transitive dependency: order_id → customer_id → customer_email. The email doesn't describe the order — it describes the customer. Move customer_email to the customers table. Reference via customer_id FK.

Concept 10 — Boyce-Codd Normal Form (BCNF)

Context

BCNF is a stricter version of 3NF. Most tables that are in 3NF are also in BCNF. The difference only shows up in specific edge cases with multiple overlapping candidate keys.

The Rule

For every functional dependency X → Y, X must be a superkey.
In plain English: Only keys are allowed to "determine" other columns.

The Violation — Classic Example

sql
QUBITS OF DPK
1Table: CourseTeacher
2Columns: (student, course, teacher)
3
4Business rules:
5  - Each teacher teaches only one course
6  - Each course can be taught by multiple teachers
7  - Each student is assigned one teacher per course
8
9Candidate keys:
10  - (student, course)
11  - (student, teacher)
12
13But: teacher → course   ← teacher determines course
14And teacher is NOT a superkey (it can't uniquely identify a row alone)
15∴ BCNF violation!
sql
QUBITS OF DPK
1-- Violates BCNF
2CREATE TABLE course_assignments (
3  student VARCHAR(100),
4  course VARCHAR(100),
5  teacher VARCHAR(100),
6  PRIMARY KEY (student, course)
7  -- Problem: teacher → course, but teacher is not a key
8);

The Fix

sql
QUBITS OF DPK
1-- Decompose into two tables
2CREATE TABLE teacher_courses (
3  teacher VARCHAR(100) PRIMARY KEY,
4  course VARCHAR(100) NOT NULL    -- teacher determines course — now teacher IS the key ✅
5);
6
7CREATE TABLE student_teachers (
8  student VARCHAR(100),
9  teacher VARCHAR(100),
10  PRIMARY KEY (student, teacher),
11  FOREIGN KEY (teacher) REFERENCES teacher_courses(teacher)
12);

Practical Reality

BCNF violations are rare in practice. The classic 3NF is sufficient for 95% of production systems. But in interviews, being able to explain the distinction shows depth.

BCNF vs 3NF — The Difference

Interview Q

"What is BCNF and how does it differ from 3NF? When would you need BCNF in practice?"
Answer: BCNF requires that every determinant (left side of a functional dependency) is a superkey. 3NF is slightly looser — it allows non-superkeys to determine key columns if they're not transitive through non-key columns. In practice, BCNF matters when you have multiple overlapping candidate keys. Most production schemas don't need BCNF explicitly — 3NF is the practical target.

Concept 11 — Denormalization

What is Denormalization?

Intentionally adding redundancy to a normalized schema to improve read performance.
Denormalization is not bad design. It is a deliberate trade-off: accept data redundancy in exchange for faster queries.

The Problem Normalization Creates at Scale

sql
QUBITS OF DPK
1-- Fully normalized: to show an order summary
2SELECT
3  o.id,
4  c.name AS customer_name,
5  r.name AS restaurant_name,
6  SUM(oi.quantity * oi.price) AS total
7FROM orders o
8JOIN customers c ON o.customer_id = c.id
9JOIN restaurants r ON o.restaurant_id = r.id
10JOIN order_items oi ON oi.order_id = o.id
11GROUP BY o.id, c.name, r.name;
12
13-- 4-table JOIN. At 10M orders, this is slow even with indexes.

The Two Denormalization Strategies

Strategy 1 — Snapshot Data (Copy at Time of Event)

Store the value as it was at the time the event happened, not a reference to the current value.
sql
QUBITS OF DPK
1CREATE TABLE orders (
2  id INT AUTO_INCREMENT PRIMARY KEY,
3  customer_id INT NOT NULL,
4  customer_name VARCHAR(100) NOT NULL,       -- snapshot of name at order time
5  restaurant_id INT NOT NULL,
6  restaurant_name VARCHAR(255) NOT NULL,     -- snapshot of name at order time
7  delivery_address TEXT NOT NULL,            -- snapshot of address at order time
8  subtotal DECIMAL(10,2) NOT NULL,           -- precomputed, not JOINed
9  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
10);
Why snapshot?
  • Customer might change their name after placing an order
  • Restaurant might change its name
  • The order receipt must reflect what was true WHEN the order was placed
  • Legal and audit requirement
Real example: Amazon order history shows the exact product name, price, and shipping address at time of purchase — not the current ones.

Strategy 2 — Precomputed / Cached Data (Live Reference)

Store a derived value alongside the source data to avoid recomputing it.
sql
QUBITS OF DPK
1CREATE TABLE products (
2  id INT AUTO_INCREMENT PRIMARY KEY,
3  name VARCHAR(255),
4  price DECIMAL(10,2),
5  total_reviews INT DEFAULT 0,              -- denormalized count
6  average_rating DECIMAL(3,2) DEFAULT 0.00  -- denormalized average
7);
8
9-- Actual review data lives here (source of truth)
10CREATE TABLE reviews (
11  id INT AUTO_INCREMENT PRIMARY KEY,
12  product_id INT NOT NULL,
13  rating INT NOT NULL CHECK (rating BETWEEN 1 AND 5),
14  review_text TEXT,
15  FOREIGN KEY (product_id) REFERENCES products(id)
16);
The cost: On every INSERT into reviews, you must also UPDATE products.total_reviews and products.average_rating. This is a write-time cost to gain read-time speed.
Used in: Product listings, dashboards, leaderboards, feeds — any place where aggregated data is read far more than it's written.

When to Denormalize

When NOT to Denormalize

  • Financial data (account balances, transaction amounts) — must always be computed from source
  • Data that changes frequently — redundant copies go stale
  • Small datasets — JOIN cost is negligible

The Golden Rule

Normalize first. Denormalize only when you have a proven performance problem or a clear audit/snapshot requirement. Premature denormalization is a design smell.

Interview Q

"Your product listing page is slow because it's JOINing 5 tables on every request. Your tech lead says to denormalize. What would you denormalize, what are the risks, and how do you keep the data consistent?"
Answer: Denormalize the most-read, least-changed attributes — product name, category name, average rating, review count — onto the products table or a dedicated read-model table. Risks: stale data, write complexity. Mitigation: update denormalized fields synchronously in a transaction (for critical data) or asynchronously via event/queue (for approximate data like ratings). For product name/price on order history: always snapshot at time of purchase — never reference live data.

Quick Interview Cheat Sheet

sql
QUBITS OF DPK
1Q: Where does FK go in 1:M?
2A: MANY side. Always.
3
4Q: What's a junction table?
5A: A table that resolves M:M by holding FKs to both parent tables + its own PK.
6
7Q: When is denormalization justified?
8A: Proven read bottleneck with JOINs, OR audit/snapshot requirement.
9
10Q: Composite PK vs Surrogate + UNIQUE?
11A: Always prefer Surrogate PK. Composite PKs bake business rules into structure.
12
13Q: 1NF vs 2NF vs 3NF in one sentence each?
14A: 1NF = atomic values.
15   2NF = no partial dependencies (all columns need the full composite key).
16   3NF = no transitive dependencies (all columns depend directly on PK).
17
18Q: BCNF in plain English?
19A: Only keys are allowed to determine other columns.
20
21Q: Snapshot vs Live denormalization?
22A: Snapshot = copy the value at the time of the event (order history, receipts).
23   Live = cache an aggregated/computed value and keep it updated on writes.