SQL 12 : Schema Design II
D
Qubits of DPK
April 9, 2026
Core DBMS
12. Indexes
What is it?
A B-tree data structure the database maintains to make lookups faster.
- Without index → full table scan O(n)
- With index → O(log n)
Types
Simple Index
sql
QUBITS OF DPK
Composite Index
sql
QUBITS OF DPK
Leftmost Prefix Rule:
sql
QUBITS OF DPK
Partial Index
sql
QUBITS OF DPK
Trade-offs
When NOT to Use
- Low cardinality columns — is_deleted has only 2 values; index is useless
- Write-heavy, rarely-read tables — maintenance cost outweighs benefit
- Small tables — full scan is faster than index overhead
🃏 ANKI
Q: Why does column order matter in a composite index?
A: MySQL uses the leftmost prefix rule — the index is only used if the query filters on the leading column first. Skipping it makes the index unusable.
Interview Q
You have 500M rows. Query WHERE customer_id = 45 AND status = 2 is slow. Fix it.
Answer: CREATE INDEX(customer_id, status) — customer_id first because it has higher cardinality and eliminates more rows at step one.
13. Partitioning and Sharding
Partitioning — Split Within One Database
Table is physically split into pieces on the same server. App sees one table.
Range Partitioning
sql
QUBITS OF DPK
Hash Partitioning
sql
QUBITS OF DPK
Sharding — Split Across Multiple Databases
Each server owns a subset of data.
javascript
QUBITS OF DPK
Comparison
When NOT to Shard
Sharding is a last resort. Exhaust this order first:
javascript
QUBITS OF DPK
🃏 ANKI
Q: What is partition pruning?
A: When a query filters on the partition key, the DB skips irrelevant partitions entirely — dramatically reduces I/O.
Interview Q
Zomato orders table: 10 billion rows, slow despite indexes. What do you do?
Answer: Partition by created_at (range) first — most queries filter recent dates. If the server itself is the bottleneck, shard by customer_id with consistent hashing.
14. Soft Deletes
What is it?
Mark rows as deleted instead of physically removing them. Row stays; queries filter it out.
sql
QUBITS OF DPK
Why It Exists
Hard deletes are unrecoverable. Soft deletes enable:
- Data recovery
- Audit trails
- Referential integrity preservation
The Hidden Danger
Every query in your codebase must include the soft delete filter. Miss it once → deleted data reappears.
Fix: ORM-level global filter — @Where in Hibernate, default scope in Spring Data.
When NOT to Use
- Transient data (sessions, OTPs) — physical delete is fine
- GDPR right-to-erasure requests — soft delete is insufficient, hard delete required
🃏 ANKI
Q: What is the biggest operational risk of soft deletes?
A: Missing the deleted_at IS NULL filter in a query — deleted records reappear. Mitigated by ORM-level global filters.
Interview Q
A GDPR erasure request arrives. Your system uses soft deletes. What do you do?
Answer: Soft delete alone doesn't satisfy GDPR. Run a hard delete pipeline: anonymize PII fields → physically remove rows with no legal retention obligation → log the erasure event for compliance audit.
15. Audit Fields
What is it?
Standard metadata columns on every production table tracking when and who.
sql
QUBITS OF DPK
DATETIME vs TIMESTAMP
Always use DATETIME for long-lived business data. TIMESTAMP overflows Jan 19, 2038.
The updated_at Rule
Never trust application code to set updated_at. Use ON UPDATE CURRENT_TIMESTAMP — the database maintains it automatically. App bugs forget; the database never does.
🃏 ANKI
Q: Why use DATETIME instead of TIMESTAMP for audit fields?
A: TIMESTAMP is a 32-bit Unix epoch integer — it overflows on Jan 19 2038. DATETIME supports up to year 9999 with no such limit.
Interview Q
A customer says their order total changed unexpectedly. How do audit fields help?
Answer: Query updated_at to find when it changed, updated_by to find which service made the change, cross-reference with application logs at that timestamp to find root cause.
16. Cascading Deletes and Constraints
The Four Options
sql
QUBITS OF DPK
Production Rule
Never use CASCADE on business-critical data. One wrong DELETE with a bad WHERE clause silently wipes thousands of rows.
Safest pattern:
- #Soft delete the parent
- #Application explicitly handles child records with full logging
- #RESTRICT as FK safety net to catch bugs
🃏 ANKI
Q: What is the safest ON DELETE strategy for production FKs and why?
A: RESTRICT — blocks the delete if children exist, forces the application to handle cleanup explicitly. Prevents silent mass deletions.
Interview Q
A junior engineer adds ON DELETE CASCADE to orders → restaurants FK. Why is this dangerous?
Answer: Accidentally deleting a restaurant row (wrong WHERE clause) would silently delete every order ever placed there. With millions of rows affected, recovery requires a full backup restore. Use RESTRICT instead.
17. Schema Evolution and Migrations
What is it?
The discipline of changing schemas safely over time — without downtime, with full version history.
Migration Tools
The Expand-Contract Pattern (Zero Downtime)
sql
QUBITS OF DPK
Large Table ALTER — Never Lock Production
sql
QUBITS OF DPK
🃏 ANKI
Q: What is the expand-contract pattern?
A: Add new column (expand) → backfill data → migrate app code → drop old column (contract). Never rename or drop directly — it breaks running instances immediately.
Interview Q
Add a NOT NULL column to a 1 billion row production table. Walk through it safely.
Answer: Add as NULL first (instant, no rewrite). Backfill in batches of 10,000 rows with SLEEP between batches. Deploy app code writing the new column. Add NOT NULL constraint only after all rows populated. Never add NOT NULL with DEFAULT on large tables — rewrites and locks every row.
18. Multi-Tenant Schema Design
What is it?
One application serves multiple customers (tenants). Core challenge: isolate each tenant's data from others.
Three Approaches
Approach 1 — Shared Table
All tenants in the same tables. Every table has tenant_id.
sql
QUBITS OF DPK
Simple, cheap, one schema to maintain
️ Noisy neighbor — one heavy tenant slows everyone
️ Data leak risk — one missing WHERE tenant_id = ? exposes all tenants
Approach 2 — Separate Schema Per Tenant
Each tenant gets their own schema on the same server.
javascript
QUBITS OF DPK
Strong isolation, easy per-tenant backup
️ Migrations must run N times (once per tenant)
Approach 3 — Separate Database Per Tenant
Each tenant gets a dedicated DB server.
Maximum isolation, compliance-friendly
️ Very expensive, high operational complexity
Production Hybrid (Industry Standard)
javascript
QUBITS OF DPK
Tenant-Aware Query Interceptor
Automatically injects tenant_id into every query at the framework level — prevents data leaks from missing filters.
🃏 ANKI
Q: What is the biggest risk of the shared table multi-tenant approach?
A: Data leak — a query missing WHERE tenant_id = ? returns all tenants' data. Mitigated by a query interceptor that auto-injects tenant filtering.
Interview Q
A new enterprise client needs guaranteed data isolation for compliance. Which approach?
Answer: Separate database (Approach 3). Row-level isolation is insufficient for compliance — a single bug can expose cross-tenant data. Physical isolation satisfies SOC2/HIPAA/GDPR auditors.
19. Event-Driven Schema Design
What is it?
Store both current state (state table) and every transition that led to it (event log). Never update the event log — only append.
Schema
sql
QUBITS OF DPK
Write Pattern — Always One Transaction
sql
QUBITS OF DPK
Why You Can't Drop the orders Table
Deriving current status from events alone requires:
sql
QUBITS OF DPK
The orders table is a materialized current state — precomputed O(1) answer to the most frequent query.
Computing Stage Durations — No Schema Change Needed
sql
QUBITS OF DPK
🃏 ANKI
Q: Why maintain both orders and order_events instead of just the event log?
A: orders gives O(1) current-state reads via PK. Event log alone requires aggregation over millions of rows per read — unacceptable at scale. Both written in same transaction to stay consistent.
Interview Q
Senior engineer: "Drop the orders table. Derive current status from the latest event in order_events. Fewer tables, simpler schema." How do you respond?
Answer: Classic read performance trap. Deriving current state from the event log requires ORDER BY + LIMIT 1 aggregation over an unbounded table on every read. At scale this kills performance. orders is a materialized current state — same principle as denormalization. Small write complexity traded for O(1) read performance.
20. OLAP vs OLTP
Core Difference
javascript
QUBITS OF DPK
Running both on the same database: analytical scans starve CPU/IO from transactions. Production systems die.
Comparison
Architecture — ETL Pipeline
javascript
QUBITS OF DPK
OLAP Star Schema
javascript
QUBITS OF DPK
When to Separate
Signal: analytical queries start causing p99 latency spikes on transactional queries.
javascript
QUBITS OF DPK
Never rely on "quiet hours" alone. One analyst running a query without WHERE at 2pm on Tuesday can take down production.
🃏 ANKI
Q: Why must OLTP and OLAP be separate systems at scale?
A: Analytical queries consume massive CPU/IO scanning billions of rows. Running them on the same server as transactions destroys millisecond response times. Decoupled via ETL — OLTP handles real-time writes, OLAP handles historical reads.
Interview Q
Startup: "We're small — 10,000 orders/day. We'll run analyst queries on prod at night." Reasonable?
Answer: Reasonable today. Instrument p99 latency from day one. When analytical queries cause measurable latency spikes → introduce read replica first. Graduate to full OLAP when complexity demands it. "Quiet hours" is not a safety guarantee — one bad query during business hours can take down prod without architectural separation.