SQL 15 : Interview Q&A

Qubits of DPK

April 11, 2026

Core DBMS

1. ACID Properties

Q1. What are ACID properties? Explain each with an example.

Atomicity — A transaction is all-or-nothing. If any step fails, the entire transaction rolls back.
Example: Transfer ₹500 from A to B. Debit A succeeds but credit B fails → entire transaction rolls back. A is not debited.

Consistency — A transaction moves the database from one valid state to another. All constraints, rules, and invariants must hold before and after.
Example: A bank balance cannot go negative. If a transfer would violate this, the transaction is rejected.

Isolation — Concurrent transactions execute as if they were serial. Intermediate states are invisible to other transactions.
Example: Two users booking the last seat simultaneously — only one should succeed. The other sees the updated state after the first commits.

Durability — Once committed, data survives crashes. Guaranteed by write-ahead logging (WAL) and disk persistence.
Example: After a payment confirms, even if the server crashes immediately after, the payment record is not lost.

Q2. Can a database be consistent without being isolated?

No. Without isolation, dirty reads and non-repeatable reads can violate consistency constraints even when individual transactions are internally consistent. Isolation is what prevents concurrent transactions from corrupting each other's invariants.

Q3. How does a database guarantee Atomicity and Durability?

Write-Ahead Log (WAL): Every change is written to a log on disk before the actual data pages are modified. On crash recovery, the DB replays committed transactions and rolls back incomplete ones.
REDO log: Replays committed changes that hadn't been flushed to disk.
UNDO log: Rolls back uncommitted transactions that were in-flight during the crash.

2. Isolation Levels

Q4. What are the four isolation levels? What problem does each solve?

Dirty Read — Reading uncommitted data from another transaction.
Non-Repeatable Read — Reading the same row twice in one transaction and getting different values.
Phantom Read — A re-executed range query returns different rows because another transaction inserted or deleted rows.

MySQL InnoDB default: REPEATABLE READ.

Q5. Two transactions run concurrently. T1 reads a row. T2 updates and commits it. T1 reads the same row again and gets a different value. What phenomenon is this? How do you prevent it?

This is a Non-Repeatable Read. Prevented by using REPEATABLE READ isolation level, which takes a snapshot at the start of the transaction and all reads within it see that snapshot (MVCC).

Q6. What is MVCC? How does it enable high concurrency?

Multi-Version Concurrency Control — instead of locking rows for reads, the database maintains multiple versions of each row. Readers see a consistent snapshot of the data at their transaction start time. Writers create new versions without blocking readers.

Result: readers never block writers, writers never block readers. Only writer-writer conflicts require locking.

3. Transactions and Locks

Q7. What is a deadlock? How does MySQL detect and resolve it?

A deadlock occurs when two transactions each hold a lock the other needs, and both wait indefinitely.

sql

QUBITS OF DPK

1T1: LOCK A → wait for B
2T2: LOCK B → wait for A

MySQL InnoDB uses a wait-for graph. When a cycle is detected, InnoDB automatically rolls back the transaction with the least undo log (cheapest to abort) and returns error 1213. The application must retry.

Prevention strategy: Always acquire locks in the same order across all transactions.

Q8. What is the difference between optimistic and pessimistic locking?

Pessimistic Locking — Assume conflicts will happen. Lock the row before reading it.

sql

QUBITS OF DPK

1SELECT * FROM orders WHERE order_id = 1 FOR UPDATE;

Blocks other writers until transaction commits. Safe but reduces throughput.

Optimistic Locking — Assume conflicts are rare. Don't lock on read. On update, check if the row changed since you read it via a version column.

sql

QUBITS OF DPK

1-- Read: version = 5
2UPDATE orders SET status = 2, version = 6
3WHERE order_id = 1 AND version = 5;
4-- If 0 rows updated → conflict, retry

Higher throughput for read-heavy systems. Requires retry logic.

Q9. What are savepoints? When would you use them?

A savepoint is a named checkpoint within a transaction that you can roll back to without rolling back the entire transaction.

sql

QUBITS OF DPK

1BEGIN;
2  INSERT INTO orders ...;
3  SAVEPOINT after_order;
4  INSERT INTO payments ...;    -- fails
5ROLLBACK TO SAVEPOINT after_order;  -- only payment step is undone
6  -- retry payment with different logic
7COMMIT;

Useful in complex business flows where partial failures should not discard all prior work.

Q10. What is the difference between a shared lock and an exclusive lock?

Multiple transactions can hold shared locks simultaneously. An exclusive lock blocks everyone.

4. Indexing

Q11. How does a B+ tree index work? Why B+ tree and not B tree?

A B+ tree is a self-balancing tree where all data lives in leaf nodes only. Internal nodes store keys for routing. Leaf nodes are linked in a doubly linked list — enabling efficient range scans.

Range queries like WHERE created_at BETWEEN x AND y are O(log n + k) — tree traversal to find the start, then linear walk through the linked leaf list.

Q12. What is a composite index? Explain the leftmost prefix rule.

A composite index on (a, b, c) is only used when the query filters on the leftmost column first.

sql

QUBITS OF DPK

1CREATE INDEX idx ON orders(status, created_at, customer_id);
2
3-- ✅ Uses index
4WHERE status = 1 AND created_at > '2024-01-01'
5
6-- ✅ Uses index (partial — first column only)
7WHERE status = 1
8
9-- ❌ Does NOT use index — skips status
10WHERE created_at > '2024-01-01'

Column order rule: highest cardinality first eliminates the most rows at each step.

Q13. What is the difference between a clustered and non-clustered index?

Clustered Index — The table rows are physically stored in the order of the index. There can only be one. In MySQL InnoDB, the Primary Key is always the clustered index.

Non-Clustered Index — A separate structure that stores the index key + a pointer to the actual row. Multiple non-clustered indexes can exist per table.

sql

QUBITS OF DPK

1Clustered:     leaf node = actual row data
2Non-Clustered: leaf node = index key + pointer to row (extra lookup)

Non-clustered index lookups require a double lookup: find the row pointer in the index, then fetch the actual row. This is called a bookmark lookup.

Q14. What is a covering index?

A covering index is one where the index itself contains all the columns the query needs — no table row lookup required.

sql

QUBITS OF DPK

1-- Query needs: customer_id, status, total_amount
2CREATE INDEX idx_covering ON orders(customer_id, status, total_amount);
3
4SELECT status, total_amount FROM orders WHERE customer_id = 45;
5-- Entire answer lives in the index. Table never touched.

Eliminates the double lookup. Major performance win for high-frequency read queries.

Q15. When should you NOT create an index?

Low cardinality columns — is_deleted, gender, status with 2-3 values. Index rarely filters enough rows to be worth the overhead.
Write-heavy tables — Every INSERT/UPDATE/DELETE must update the index. High write tables pay a constant overhead cost.
Small tables — Full scan on 1,000 rows is faster than index overhead.
Columns never used in WHERE, JOIN, or ORDER BY — Dead weight.

5. Normalization

Q16. What is normalization? Why does it exist?

Normalization is the process of organizing a schema to eliminate redundancy and prevent anomalies.

Three anomalies normalization solves:

Insert anomaly — Can't insert data without unrelated data existing first.
Update anomaly — Changing one fact requires updating it in multiple rows. Missing one creates inconsistency.
Delete anomaly — Deleting a row unintentionally destroys other unrelated facts.

Q17. Explain 1NF, 2NF, 3NF with examples.

1NF — Atomic values, no repeating groups
Every column must contain a single indivisible value. No arrays, no comma-separated lists.

sql

QUBITS OF DPK

1❌ orders: order_id | items = "Pizza, Burger, Coke"
2✅ order_items: order_id | item_name  (one row per item)

2NF — No partial dependency on composite key
Every non-key column must depend on the entire composite key, not just part of it.

sql

QUBITS OF DPK

1❌ order_items: (order_id, item_id) | item_name | customer_name
2   customer_name depends only on order_id, not on item_id
3✅ Split: order_items(order_id, item_id, quantity) + orders(order_id, customer_name)

3NF — No transitive dependency
Non-key columns must depend on the primary key directly, not on another non-key column.

sql

QUBITS OF DPK

1❌ orders: order_id | customer_id | customer_city
2   customer_city depends on customer_id, not order_id
3✅ Split: orders(order_id, customer_id) + customers(customer_id, customer_city)

Q18. What is BCNF? How is it different from 3NF?

Boyce-Codd Normal Form — A stricter version of 3NF. Every determinant must be a candidate key.

A table can be in 3NF but violate BCNF when a non-candidate-key column determines part of a candidate key.

3NF allows some anomalies when multiple overlapping candidate keys exist. BCNF eliminates them all. BCNF is the production-standard target for transactional schemas.

Q19. When do you intentionally denormalize?

When read performance outweighs the cost of write overhead and redundancy.

sql

QUBITS OF DPK

1-- Normalized: JOIN orders + restaurants on every read
2-- Denormalized: store restaurant_name directly in orders table
3ALTER TABLE orders ADD COLUMN restaurant_name VARCHAR(100);

When to denormalize:

Dashboards or analytics showing data that never changes after the fact
Read-to-write ratio is 100:1 or higher
JOIN cost on hot tables is measurable in p99 latency

Cost: Every update to the source must propagate to the denormalized copy. Risk of staleness.

6. Keys and Constraints

Q20. What is the difference between Primary Key, Foreign Key, Candidate Key, and Composite Key?

Candidate Key — Any column or set of columns that could uniquely identify a row. A table can have multiple.
Primary Key — The chosen candidate key. One per table. Cannot be NULL. Clustered index in InnoDB.
Foreign Key — A column referencing the Primary Key of another table. Enforces referential integrity. Can be NULL. Can have duplicates.Composite Key — A key made of two or more columns. Used when no single column uniquely identifies a row.

Q21. Can a Foreign Key be NULL? Can it have duplicate values?

Yes to both. A NULL FK means "no relationship exists for this row" — not a violation. Duplicates are allowed because many rows can reference the same parent row (1:M relationship). Only Primary Keys must be unique and non-null.

Q22. What is referential integrity? How do Foreign Keys enforce it?

Referential integrity means every FK value must either be NULL or match an existing PK value in the referenced table. Foreign Keys enforce this at the database level — the DB rejects any INSERT or UPDATE that would create a dangling reference, and any DELETE of a parent row that has children (with RESTRICT).

7. Joins

Q23. What is the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN?

Q24. What is the difference between WHERE and ON in a LEFT JOIN? This is a very common interview trap.

sql

QUBITS OF DPK

1-- Query A: filter in ON clause
2SELECT * FROM orders o
3LEFT JOIN customers c ON o.customer_id = c.customer_id AND c.city = 'Mumbai';
4
5-- Query B: filter in WHERE clause
6SELECT * FROM orders o
7LEFT JOIN customers c ON o.customer_id = c.customer_id
8WHERE c.city = 'Mumbai';

Query A — The filter applies during the join. All orders are returned. If the customer is not from Mumbai, the customer columns are NULL — but the order row is still present.

Query B — The filter applies after the join. Orders whose customers are not from Mumbai are eliminated. This effectively converts the LEFT JOIN into an INNER JOIN.

Rule: Filtering on the right table's columns in WHERE destroys the LEFT JOIN behavior.

Q25. What is a self join? When would you use it?

A self join joins a table with itself. Used when rows in a table have a relationship to other rows in the same table.

sql

QUBITS OF DPK

1-- Find all employees and their managers (both in the same table)
2SELECT e.name AS employee, m.name AS manager
3FROM employees e
4LEFT JOIN employees m ON e.manager_id = m.employee_id;

Also used for: follower/following relationships, parent/child category hierarchies, org charts.

8. Stored Procedures, Views, and CTEs

Q26. What is the difference between a View, CTE, and Subquery?

Use CTE for query readability and recursion. Use View for reusability across multiple queries. Use Subquery only when a CTE would be overkill for a simple inline filter.

Q27. What is a stored procedure? Advantages and disadvantages?

A stored procedure is precompiled SQL logic stored in the database, callable by name.

sql

QUBITS OF DPK

1CALL GetOrdersByCustomer(45);

Advantages:

Reduced network round trips — logic executes server-side
Precompiled — faster execution plan
Centralized business logic, enforced consistently

Disadvantages:

Hard to version control alongside application code
Difficult to test and debug
Creates tight coupling between application and DB schema
Not portable across database engines

Interview position: In modern microservices architectures, stored procedures are generally avoided. Business logic belongs in the application layer where it can be versioned, tested, and deployed independently.

9. Schema Design

Q28. A junior engineer puts ON DELETE CASCADE on a orders → restaurants foreign key. What is wrong with this?

Accidentally deleting a restaurant row with a bad WHERE clause would silently cascade and delete every order ever associated with that restaurant — potentially millions of rows. Recovery requires a full backup restore.

Correct approach: Use ON DELETE RESTRICT. It blocks the parent delete if children exist, forcing the application to handle cleanup explicitly with full logging and control.

Q29. When would you use a soft delete vs a hard delete?

Soft delete — Set deleted_at = NOW(). Row stays in the table. Used when you need audit trails, data recovery, or referential integrity preservation.

Hard delete — Physically remove the row. Used for transient data (OTPs, sessions) and required for GDPR right-to-erasure compliance — soft delete alone does not satisfy GDPR.

Hidden danger of soft deletes: Every query must include WHERE deleted_at IS NULL. Miss it once and deleted data reappears. Mitigate with ORM-level global filters.

Q30. Design the schema for an order tracking system where you need to track every status change.

sql

QUBITS OF DPK

1-- State table: O(1) current state lookup
2CREATE TABLE orders (
3  order_id       INT PRIMARY KEY AUTO_INCREMENT,
4  customer_id    INT NOT NULL,
5  restaurant_id  INT NOT NULL,
6  current_status TINYINT NOT NULL DEFAULT 1,
7  created_at     DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
8  updated_at     DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
9);
10
11-- Event log: append-only, never updated
12CREATE TABLE order_events (
13  event_id    INT PRIMARY KEY AUTO_INCREMENT,
14  order_id    INT NOT NULL,
15  from_status TINYINT,
16  to_status   TINYINT NOT NULL,
17  changed_by  INT,
18  changed_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
19  FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE RESTRICT
20);

Both tables are written in a single transaction — if either write fails, both roll back. Never derive current state from the event log alone; ORDER BY + LIMIT 1 over millions of events is unacceptably slow.

10. Performance and Scaling

Q31. An orders table has 500M rows. Queries are slow despite indexes. What is your escalation path?

sql

QUBITS OF DPK

1Step 1: Verify indexes exist and are actually being used (EXPLAIN)
2Step 2: Check query — missing WHERE clause, functions on indexed columns, wrong column order in composite index
3Step 3: Add or fix indexes
4Step 4: Partition the table by created_at (range partition — most queries filter recent data)
5Step 5: Add read replicas — offload read traffic from primary
6Step 6: Introduce caching (Redis) for hot read paths
7Step 7: Archive old data to a separate table or OLAP system
8Step 8: Shard (last resort — only when a single server cannot handle load)

Never jump to sharding without exhausting every earlier step. Sharding is irreversible and operationally expensive.

Q32. What is query optimization? How does EXPLAIN work?

The query optimizer generates an execution plan — the sequence of operations (index scan, full scan, join order, etc.) the DB will use. EXPLAIN shows that plan.

sql

QUBITS OF DPK

1EXPLAIN SELECT * FROM orders WHERE customer_id = 45 AND status = 2;

Key columns to examine:

Q33. What is the N+1 query problem? How do you fix it?

The N+1 problem occurs when code runs 1 query to fetch N parent records, then runs 1 additional query per parent to fetch its children — total N+1 queries.

sql

QUBITS OF DPK

1-- N+1: 1 query for 100 orders + 100 queries for each order's customer = 101 queries
2-- Fix: JOIN or eager load in one query
3SELECT o.*, c.name FROM orders o JOIN customers c ON o.customer_id = c.customer_id;

In ORMs: use .join(), fetch = EAGER, or @EntityGraph in JPA to force the join. Never lazy-load in a loop.

Q34. What is the difference between partitioning and sharding?

Partitioning — Splits one table into physical pieces on the same server. The application sees one table. Low complexity. Use when a single table exceeds ~100M rows.

Sharding — Splits data across multiple servers. Each shard is an independent database. Very high complexity — cross-shard joins are nearly impossible, distributed transactions are hard. Use only when a single server cannot handle the load.

Exhaust partitioning, read replicas, and caching before considering sharding.

11. OLAP vs OLTP

Q35. What is the difference between OLTP and OLAP systems?

Q36. A data analyst's query is causing p99 latency spikes on your production MySQL database. What do you do?

sql

QUBITS OF DPK

1Immediate: Kill the query if it's currently running
2Short-term: Add a read replica. Route all analyst queries to the replica.
3Medium-term: Set up an ETL pipeline to load data into a dedicated OLAP system (BigQuery / Redshift)
4Long-term: All analytics run on OLAP. Production DB is never touched for analytical workloads.

Never allow full-scan analytical queries on the OLTP primary. One query without a WHERE clause can take down production.

12. Schema Evolution

Q37. How do you rename a column in a production database with zero downtime?

Never do this directly — it breaks running application instances immediately:

sql

QUBITS OF DPK

1-- ❌ Instant downtime
2ALTER TABLE orders RENAME COLUMN user_id TO customer_id;

Correct approach — Expand-Contract pattern:

sql

QUBITS OF DPK

1-- Step 1 EXPAND: Add new column as nullable
2ALTER TABLE orders ADD COLUMN customer_id INT NULL;
3
4-- Step 2 BACKFILL: Copy data in batches (with SLEEP to reduce I/O pressure)
5UPDATE orders SET customer_id = user_id LIMIT 10000;
6
7-- Step 3 MIGRATE: Deploy new app code that writes to customer_id
8
9-- Step 4 CONTRACT: Drop old column only after confirming zero reads
10ALTER TABLE orders DROP COLUMN user_id;

Q38. How do you add a NOT NULL column to a billion-row table safely?

sql

QUBITS OF DPK

1-- Step 1: Add as NULL first — instant, no table rewrite
2ALTER TABLE orders ADD COLUMN delivery_fee DECIMAL(8,2) NULL;
3
4-- Step 2: Backfill in small batches with sleep (avoids I/O spike)
5UPDATE orders SET delivery_fee = 0.00
6WHERE delivery_fee IS NULL LIMIT 10000;
7-- Repeat until complete
8
9-- Step 3: Deploy app code that always writes delivery_fee
10
11-- Step 4: Add NOT NULL constraint after all rows are populated
12ALTER TABLE orders MODIFY COLUMN delivery_fee DECIMAL(8,2) NOT NULL;

Never add NOT NULL DEFAULT value in one step on large tables — MySQL rewrites every row, locks the table for hours. Use pt-online-schema-change or gh-ost for large tables.

13. Multi-Tenancy

Q39. What are the three approaches to multi-tenant schema design? When do you use each?

Approach 1 — Shared Table
All tenants share the same tables. Every table has a tenant_id column.
Simple, cheap, one migration to maintain
️ Noisy neighbor risk. Data leak if WHERE tenant_id = ? is ever missing.

Approach 2 — Separate Schema Per Tenant
Each tenant gets their own schema on the same DB server.
Strong isolation, easy per-tenant backup
️ Migrations must run N times — one per tenant.

Approach 3 — Separate Database Per Tenant
Each tenant gets a dedicated server.
Maximum isolation, satisfies SOC2/HIPAA/GDPR auditors
️ Very expensive, complex to operate.

Industry hybrid:

sql

QUBITS OF DPK

1Free/small tier   → Shared table
2Mid-tier          → Separate schema
3Enterprise        → Separate database

Quick-Fire Round — Most Asked in Interviews