SQL 11 : Schema Design I
D
Qubits of DPK
April 10, 2026
Core DBMS
Concept 1 — What is Schema Design?
Simple Definition
Schema design is the blueprint of how your data is stored in a database.
Before writing a single table, a professional engineer asks:
- What data do I need to store?
- How does this data relate to other data?
- How will this data be queried?
- How will this data grow over time?
The Problem It Solves
Without schema design, databases become:
- Inconsistent (same data stored in 5 different formats)
- Slow (no thought given to query patterns)
- Impossible to maintain (changing one thing breaks everything)
Real-World Analogy
Think of schema design like architecting a building before construction.
You don't start laying bricks randomly. You draw the blueprint first — where walls go, where doors go, how rooms connect. The schema is that blueprint for your data.
You don't start laying bricks randomly. You draw the blueprint first — where walls go, where doors go, how rooms connect. The schema is that blueprint for your data.
What Professionals Actually Do
sql
QUBITS OF DPK
Key Principle
Schema design is not about tables. It's about modeling reality accurately, then optimizing for how the application reads and writes that reality.
Interview Q
"What is the first thing you do before designing a database schema?"
Answer: Understand the business domain and query patterns. A schema designed without knowing how data will be queried will almost always be wrong.
Concept 2 — Schema Design Process
The 5-Step Process
sql
QUBITS OF DPK
Real Example — Zomato
sql
QUBITS OF DPK
Key Principle
Always think about Step 5 (query patterns) while doing Step 1–4. The best normalized schema is useless if it can't answer your queries efficiently.
Interview Q
"Walk me through how you would approach designing a schema for an e-commerce system from scratch."
Answer: Start with entities (User, Product, Order, OrderItem, Payment), map their relationships, choose surrogate PKs, add FK constraints, then validate that the schema can answer the 5 most common queries efficiently.
Concept 3 — Identifying Entities
What is an Entity?
An entity is a real-world "thing" that needs its own table because:
- It has its own identity (can be uniquely identified)
- It has its own attributes (has meaningful columns)
- It has its own lifecycle (can be created, updated, deleted independently)
The User vs Customer Distinction
This is a classic interview trap.
Wrong design (merged):
sql
QUBITS OF DPK
Correct design (separated):
sql
QUBITS OF DPK
Why? A user becomes a customer only after their first purchase. Keeping them separate lets you:
- Onboard users without requiring shipping info
- Track "registered but never purchased" users separately
- Evolve customer data without touching auth logic
Role-Based Entity Split
When one entity has fundamentally different behaviors depending on role, split it.
Example — Booking System (MakeMyTrip):
sql
QUBITS OF DPK
Key Principle
When in doubt, split. Merging two logically different entities into one table is one of the most common and most painful schema mistakes to fix later.
Interview Q
"In a food delivery app, should you store customers and delivery partners in the same table or separate tables? Why?"
Answer: Separate tables. They share very little — customers have addresses and order history, delivery partners have vehicle info, live location, and earnings. Merging them creates sparse columns and breaks Single Responsibility. They also have different lifecycles, different access patterns, and different compliance requirements.
Concept 4 — One-to-Many Relationships
The Rule
The Foreign Key always lives on the MANY side.
Why?
Because the MANY side is the one that "belongs to" the ONE side.
sql
QUBITS OF DPK
If you put it the other way (array of menu_item_ids on the restaurant), you violate 1NF and create an unmaintainable mess.
Schema Example — Zomato
sql
QUBITS OF DPK
Visual
sql
QUBITS OF DPK
Query Pattern
sql
QUBITS OF DPK
ON DELETE Options
Interview Q
"In a schema where one User can have many Orders, where does the foreign key go? What happens to orders if the user is deleted?"
Answer: FK user_id goes on the orders table (the MANY side). For deletion: use RESTRICT or soft-delete, not CASCADE — you never want to cascade-delete financial records. Keep the order; anonymize the user data if needed for compliance.
Concept 5 — Composite Key Trap
The Trap
New engineers often use composite PKs like (user_id, course_id) thinking "this combination is unique, so it's a natural PK."
This causes problems:
sql
QUBITS OF DPK
Problem 1 — Business rule changes:
What if a user can enroll in the same course twice (re-enroll after expiry)?
Now your PK is wrong. You need to migrate data.
What if a user can enroll in the same course twice (re-enroll after expiry)?
Now your PK is wrong. You need to migrate data.
Problem 2 — FK references become painful:
Any table that wants to reference an enrollment needs BOTH columns as a FK.
Any table that wants to reference an enrollment needs BOTH columns as a FK.
Problem 3 — Query performance:
Composite PKs create clustered index on both columns. Querying by just course_id won't use the index efficiently.
Composite PKs create clustered index on both columns. Querying by just course_id won't use the index efficiently.
The Correct Pattern
sql
QUBITS OF DPK
Why this is better:
- id is the stable, immutable identity of a row
- Business uniqueness is enforced by UNIQUE, not the PK
- If business rules change (allow re-enrollment), just drop the UNIQUE constraint — PK stays untouched
- Any other table can reference enrollments with just enrollment_id
The Rule
Use a surrogate PK (AUTO_INCREMENT id) for the row's identity. Use a UNIQUE constraint to enforce business uniqueness.
When Composite PKs Are Acceptable
In pure junction tables where:
- The combination is universally unique
- No other table will ever reference individual rows
- Business rules are truly fixed
Even then, most production engineers add a surrogate PK for consistency.
Interview Q
"A table tracks which students are enrolled in which courses. Should you use a composite PK of (student_id, course_id) or a surrogate key? Defend your choice."
Answer: Surrogate PK + UNIQUE constraint on (student_id, course_id). The composite PK bakes a business assumption into the structural identity of the row. When the business rule changes — and it always does — the structural identity shouldn't have to change with it.
Concept 6 — Many-to-Many + Junction Tables
The Problem
A many-to-many relationship cannot be stored in either parent table.
sql
QUBITS OF DPK
You can't store course_ids as an array in students (violates 1NF).
You can't store student_ids as an array in courses (violates 1NF).
You can't store student_ids as an array in courses (violates 1NF).
Solution: A junction table (also called bridge table, associative table, or linking table).
Schema
sql
QUBITS OF DPK
Visual
sql
QUBITS OF DPK
Key Insight — Attributes of the Relationship
The junction table can hold attributes that belong to the relationship, not to either entity.
- enrolled_at — when did THIS student enroll in THIS course?
- grade — what grade did THIS student get in THIS course?
- status — is this enrollment active or expired?
These don't belong on students (different per course) or courses (different per student). They belong on the junction row.
Queries
sql
QUBITS OF DPK
Interview Q
"Design a schema for a tagging system where posts can have many tags and tags can belong to many posts. What extra fields might the junction table need?"
Answer: Junction table post_tags(post_id, tag_id, tagged_at, tagged_by_user_id). The extra fields capture audit data — who tagged it and when. These are attributes of the tagging action, not of the post or the tag.
Concept 7 — First Normal Form (1NF)
The Rule
Every column must contain atomic (indivisible) values. No repeating groups. No arrays.
What Violates 1NF?
sql
QUBITS OF DPK
Why This Breaks Things
sql
QUBITS OF DPK
The Fix
sql
QUBITS OF DPK
The Test for 1NF
Ask yourself about every column:
- #Could this column hold more than one value? → Violates 1NF
- #Do I have item_1, item_2, item_3 columns? → Violates 1NF
- #Can I directly index and query this column? → Passes 1NF if yes
Interview Q
"Your colleague stores phone numbers as a comma-separated string in a single column. What's wrong with this and how do you fix it?"
Answer: Violates 1NF — the column isn't atomic. You can't index it, can't query individual numbers efficiently, and can't enforce data integrity. Fix: create a user_phone_numbers table with (id, user_id, phone_number, type ENUM('mobile','home','work')). Each phone number gets its own row.
Concept 8 — Second Normal Form (2NF)
Prerequisite
2NF only applies when you have a composite primary key.
If your table has a single-column PK, it's automatically 2NF if it's 1NF.
If your table has a single-column PK, it's automatically 2NF if it's 1NF.
The Rule
Every non-key column must depend on the WHOLE composite primary key, not just part of it.
A partial dependency = a non-key column depends on only one part of the composite PK.
The Violation
sql
QUBITS OF DPK
student_name only needs student_id to be determined. It doesn't need course_id. That's a partial dependency — a 2NF violation.
Why This Hurts
- Update anomaly: If a student changes their name, you must update it in EVERY enrollment row.
- Redundancy: Student name stored repeatedly for every course they're enrolled in.
- Inconsistency: What if one enrollment row has the old name and another has the new one?
The Fix
sql
QUBITS OF DPK
Memory Aid
Partial dependency = attribute that could live in a different, simpler table. Move it there.
Interview Q
"Explain what a partial dependency is and why it violates 2NF. Give an example."
Answer: A partial dependency is when a non-key column depends on only part of a composite primary key. Example: In (order_id, product_id, product_name) — product_name depends only on product_id, not the full composite key. This creates update anomalies: change the product name and you must update every order row containing it. Fix: move product_name to a products table.
Concept 9 — Third Normal Form (3NF)
The Rule
No non-key column should depend on another non-key column. Every non-key column must depend directly on the primary key — nothing else.
A transitive dependency = Column A → Column B → Column C (C depends on B, B depends on A, but C doesn't directly depend on A's PK).
The Violation
sql
QUBITS OF DPK
The chain: order.id → customer_id → customer_city
customer_city doesn't directly describe the order. It describes the customer. It's in the wrong table.
Why This Hurts
- Customer moves to a new city → must update every order row
- Inconsistency risk: two orders can disagree on the same customer's city
- City data is duplicated across all orders for the same customer
The Fix
sql
QUBITS OF DPK
Now customer_city lives in exactly one place. Update it once, reflected everywhere.
The Test for 3NF
For every non-key column, ask:
"Does this column describe the entity this table is about, or does it describe something else?"
If it describes something else → it belongs in a different table.
Interview Q
"An orders table has columns: order_id, customer_id, customer_email, order_amount. Is this in 3NF? What's wrong and how do you fix it?"
Answer: Not in 3NF. customer_email has a transitive dependency: order_id → customer_id → customer_email. The email doesn't describe the order — it describes the customer. Move customer_email to the customers table. Reference via customer_id FK.
Concept 10 — Boyce-Codd Normal Form (BCNF)
Context
BCNF is a stricter version of 3NF. Most tables that are in 3NF are also in BCNF. The difference only shows up in specific edge cases with multiple overlapping candidate keys.
The Rule
For every functional dependency X → Y, X must be a superkey.
In plain English: Only keys are allowed to "determine" other columns.
The Violation — Classic Example
sql
QUBITS OF DPK
sql
QUBITS OF DPK
The Fix
sql
QUBITS OF DPK
Practical Reality
BCNF violations are rare in practice. The classic 3NF is sufficient for 95% of production systems. But in interviews, being able to explain the distinction shows depth.
BCNF vs 3NF — The Difference
Interview Q
"What is BCNF and how does it differ from 3NF? When would you need BCNF in practice?"
Answer: BCNF requires that every determinant (left side of a functional dependency) is a superkey. 3NF is slightly looser — it allows non-superkeys to determine key columns if they're not transitive through non-key columns. In practice, BCNF matters when you have multiple overlapping candidate keys. Most production schemas don't need BCNF explicitly — 3NF is the practical target.
Concept 11 — Denormalization
What is Denormalization?
Intentionally adding redundancy to a normalized schema to improve read performance.
Denormalization is not bad design. It is a deliberate trade-off: accept data redundancy in exchange for faster queries.
The Problem Normalization Creates at Scale
sql
QUBITS OF DPK
The Two Denormalization Strategies
Strategy 1 — Snapshot Data (Copy at Time of Event)
Store the value as it was at the time the event happened, not a reference to the current value.
sql
QUBITS OF DPK
Why snapshot?
- Customer might change their name after placing an order
- Restaurant might change its name
- The order receipt must reflect what was true WHEN the order was placed
- Legal and audit requirement
Real example: Amazon order history shows the exact product name, price, and shipping address at time of purchase — not the current ones.
Strategy 2 — Precomputed / Cached Data (Live Reference)
Store a derived value alongside the source data to avoid recomputing it.
sql
QUBITS OF DPK
The cost: On every INSERT into reviews, you must also UPDATE products.total_reviews and products.average_rating. This is a write-time cost to gain read-time speed.
Used in: Product listings, dashboards, leaderboards, feeds — any place where aggregated data is read far more than it's written.
When to Denormalize
When NOT to Denormalize
- Financial data (account balances, transaction amounts) — must always be computed from source
- Data that changes frequently — redundant copies go stale
- Small datasets — JOIN cost is negligible
The Golden Rule
Normalize first. Denormalize only when you have a proven performance problem or a clear audit/snapshot requirement. Premature denormalization is a design smell.
Interview Q
"Your product listing page is slow because it's JOINing 5 tables on every request. Your tech lead says to denormalize. What would you denormalize, what are the risks, and how do you keep the data consistent?"
Answer: Denormalize the most-read, least-changed attributes — product name, category name, average rating, review count — onto the products table or a dedicated read-model table. Risks: stale data, write complexity. Mitigation: update denormalized fields synchronously in a transaction (for critical data) or asynchronously via event/queue (for approximate data like ratings). For product name/price on order history: always snapshot at time of purchase — never reference live data.
Quick Interview Cheat Sheet
sql
QUBITS OF DPK