SQL 15 : Interview Q&A
D
Qubits of DPK
April 11, 2026
Core DBMS
1. ACID Properties
Q1. What are ACID properties? Explain each with an example.
Atomicity — A transaction is all-or-nothing. If any step fails, the entire transaction rolls back.
Example: Transfer ₹500 from A to B. Debit A succeeds but credit B fails → entire transaction rolls back. A is not debited.
Example: Transfer ₹500 from A to B. Debit A succeeds but credit B fails → entire transaction rolls back. A is not debited.
Consistency — A transaction moves the database from one valid state to another. All constraints, rules, and invariants must hold before and after.
Example: A bank balance cannot go negative. If a transfer would violate this, the transaction is rejected.
Example: A bank balance cannot go negative. If a transfer would violate this, the transaction is rejected.
Isolation — Concurrent transactions execute as if they were serial. Intermediate states are invisible to other transactions.
Example: Two users booking the last seat simultaneously — only one should succeed. The other sees the updated state after the first commits.
Example: Two users booking the last seat simultaneously — only one should succeed. The other sees the updated state after the first commits.
Durability — Once committed, data survives crashes. Guaranteed by write-ahead logging (WAL) and disk persistence.
Example: After a payment confirms, even if the server crashes immediately after, the payment record is not lost.
Example: After a payment confirms, even if the server crashes immediately after, the payment record is not lost.
Q2. Can a database be consistent without being isolated?
No. Without isolation, dirty reads and non-repeatable reads can violate consistency constraints even when individual transactions are internally consistent. Isolation is what prevents concurrent transactions from corrupting each other's invariants.
Q3. How does a database guarantee Atomicity and Durability?
- Write-Ahead Log (WAL): Every change is written to a log on disk before the actual data pages are modified. On crash recovery, the DB replays committed transactions and rolls back incomplete ones.
- REDO log: Replays committed changes that hadn't been flushed to disk.
- UNDO log: Rolls back uncommitted transactions that were in-flight during the crash.
2. Isolation Levels
Q4. What are the four isolation levels? What problem does each solve?
Dirty Read — Reading uncommitted data from another transaction.
Non-Repeatable Read — Reading the same row twice in one transaction and getting different values.
Phantom Read — A re-executed range query returns different rows because another transaction inserted or deleted rows.
Non-Repeatable Read — Reading the same row twice in one transaction and getting different values.
Phantom Read — A re-executed range query returns different rows because another transaction inserted or deleted rows.
MySQL InnoDB default: REPEATABLE READ.
Q5. Two transactions run concurrently. T1 reads a row. T2 updates and commits it. T1 reads the same row again and gets a different value. What phenomenon is this? How do you prevent it?
This is a Non-Repeatable Read. Prevented by using REPEATABLE READ isolation level, which takes a snapshot at the start of the transaction and all reads within it see that snapshot (MVCC).
Q6. What is MVCC? How does it enable high concurrency?
Multi-Version Concurrency Control — instead of locking rows for reads, the database maintains multiple versions of each row. Readers see a consistent snapshot of the data at their transaction start time. Writers create new versions without blocking readers.
Result: readers never block writers, writers never block readers. Only writer-writer conflicts require locking.
3. Transactions and Locks
Q7. What is a deadlock? How does MySQL detect and resolve it?
A deadlock occurs when two transactions each hold a lock the other needs, and both wait indefinitely.
sql
QUBITS OF DPK
MySQL InnoDB uses a wait-for graph. When a cycle is detected, InnoDB automatically rolls back the transaction with the least undo log (cheapest to abort) and returns error 1213. The application must retry.
Prevention strategy: Always acquire locks in the same order across all transactions.
Q8. What is the difference between optimistic and pessimistic locking?
Pessimistic Locking — Assume conflicts will happen. Lock the row before reading it.
sql
QUBITS OF DPK
Blocks other writers until transaction commits. Safe but reduces throughput.
Optimistic Locking — Assume conflicts are rare. Don't lock on read. On update, check if the row changed since you read it via a version column.
sql
QUBITS OF DPK
Higher throughput for read-heavy systems. Requires retry logic.
Q9. What are savepoints? When would you use them?
A savepoint is a named checkpoint within a transaction that you can roll back to without rolling back the entire transaction.
sql
QUBITS OF DPK
Useful in complex business flows where partial failures should not discard all prior work.
Q10. What is the difference between a shared lock and an exclusive lock?
Multiple transactions can hold shared locks simultaneously. An exclusive lock blocks everyone.
4. Indexing
Q11. How does a B+ tree index work? Why B+ tree and not B tree?
A B+ tree is a self-balancing tree where all data lives in leaf nodes only. Internal nodes store keys for routing. Leaf nodes are linked in a doubly linked list — enabling efficient range scans.
Range queries like WHERE created_at BETWEEN x AND y are O(log n + k) — tree traversal to find the start, then linear walk through the linked leaf list.
Q12. What is a composite index? Explain the leftmost prefix rule.
A composite index on (a, b, c) is only used when the query filters on the leftmost column first.
sql
QUBITS OF DPK
Column order rule: highest cardinality first eliminates the most rows at each step.
Q13. What is the difference between a clustered and non-clustered index?
Clustered Index — The table rows are physically stored in the order of the index. There can only be one. In MySQL InnoDB, the Primary Key is always the clustered index.
Non-Clustered Index — A separate structure that stores the index key + a pointer to the actual row. Multiple non-clustered indexes can exist per table.
sql
QUBITS OF DPK
Non-clustered index lookups require a double lookup: find the row pointer in the index, then fetch the actual row. This is called a bookmark lookup.
Q14. What is a covering index?
A covering index is one where the index itself contains all the columns the query needs — no table row lookup required.
sql
QUBITS OF DPK
Eliminates the double lookup. Major performance win for high-frequency read queries.
Q15. When should you NOT create an index?
- Low cardinality columns — is_deleted, gender, status with 2-3 values. Index rarely filters enough rows to be worth the overhead.
- Write-heavy tables — Every INSERT/UPDATE/DELETE must update the index. High write tables pay a constant overhead cost.
- Small tables — Full scan on 1,000 rows is faster than index overhead.
- Columns never used in WHERE, JOIN, or ORDER BY — Dead weight.
5. Normalization
Q16. What is normalization? Why does it exist?
Normalization is the process of organizing a schema to eliminate redundancy and prevent anomalies.
Three anomalies normalization solves:
- Insert anomaly — Can't insert data without unrelated data existing first.
- Update anomaly — Changing one fact requires updating it in multiple rows. Missing one creates inconsistency.
- Delete anomaly — Deleting a row unintentionally destroys other unrelated facts.
Q17. Explain 1NF, 2NF, 3NF with examples.
1NF — Atomic values, no repeating groups
Every column must contain a single indivisible value. No arrays, no comma-separated lists.
Every column must contain a single indivisible value. No arrays, no comma-separated lists.
sql
QUBITS OF DPK
2NF — No partial dependency on composite key
Every non-key column must depend on the entire composite key, not just part of it.
Every non-key column must depend on the entire composite key, not just part of it.
sql
QUBITS OF DPK
3NF — No transitive dependency
Non-key columns must depend on the primary key directly, not on another non-key column.
Non-key columns must depend on the primary key directly, not on another non-key column.
sql
QUBITS OF DPK
Q18. What is BCNF? How is it different from 3NF?
Boyce-Codd Normal Form — A stricter version of 3NF. Every determinant must be a candidate key.
A table can be in 3NF but violate BCNF when a non-candidate-key column determines part of a candidate key.
3NF allows some anomalies when multiple overlapping candidate keys exist. BCNF eliminates them all. BCNF is the production-standard target for transactional schemas.
Q19. When do you intentionally denormalize?
When read performance outweighs the cost of write overhead and redundancy.
sql
QUBITS OF DPK
When to denormalize:
- Dashboards or analytics showing data that never changes after the fact
- Read-to-write ratio is 100:1 or higher
- JOIN cost on hot tables is measurable in p99 latency
Cost: Every update to the source must propagate to the denormalized copy. Risk of staleness.
6. Keys and Constraints
Q20. What is the difference between Primary Key, Foreign Key, Candidate Key, and Composite Key?
Candidate Key — Any column or set of columns that could uniquely identify a row. A table can have multiple.
Primary Key — The chosen candidate key. One per table. Cannot be NULL. Clustered index in InnoDB.
Foreign Key — A column referencing the Primary Key of another table. Enforces referential integrity. Can be NULL. Can have duplicates.Composite Key — A key made of two or more columns. Used when no single column uniquely identifies a row.
Primary Key — The chosen candidate key. One per table. Cannot be NULL. Clustered index in InnoDB.
Foreign Key — A column referencing the Primary Key of another table. Enforces referential integrity. Can be NULL. Can have duplicates.Composite Key — A key made of two or more columns. Used when no single column uniquely identifies a row.
Q21. Can a Foreign Key be NULL? Can it have duplicate values?
Yes to both. A NULL FK means "no relationship exists for this row" — not a violation. Duplicates are allowed because many rows can reference the same parent row (1:M relationship). Only Primary Keys must be unique and non-null.
Q22. What is referential integrity? How do Foreign Keys enforce it?
Referential integrity means every FK value must either be NULL or match an existing PK value in the referenced table. Foreign Keys enforce this at the database level — the DB rejects any INSERT or UPDATE that would create a dangling reference, and any DELETE of a parent row that has children (with RESTRICT).
7. Joins
Q23. What is the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN?
Q24. What is the difference between WHERE and ON in a LEFT JOIN? This is a very common interview trap.
sql
QUBITS OF DPK
Query A — The filter applies during the join. All orders are returned. If the customer is not from Mumbai, the customer columns are NULL — but the order row is still present.
Query B — The filter applies after the join. Orders whose customers are not from Mumbai are eliminated. This effectively converts the LEFT JOIN into an INNER JOIN.
Rule: Filtering on the right table's columns in WHERE destroys the LEFT JOIN behavior.
Q25. What is a self join? When would you use it?
A self join joins a table with itself. Used when rows in a table have a relationship to other rows in the same table.
sql
QUBITS OF DPK
Also used for: follower/following relationships, parent/child category hierarchies, org charts.
8. Stored Procedures, Views, and CTEs
Q26. What is the difference between a View, CTE, and Subquery?
Use CTE for query readability and recursion. Use View for reusability across multiple queries. Use Subquery only when a CTE would be overkill for a simple inline filter.
Q27. What is a stored procedure? Advantages and disadvantages?
A stored procedure is precompiled SQL logic stored in the database, callable by name.
sql
QUBITS OF DPK
Advantages:
- Reduced network round trips — logic executes server-side
- Precompiled — faster execution plan
- Centralized business logic, enforced consistently
Disadvantages:
- Hard to version control alongside application code
- Difficult to test and debug
- Creates tight coupling between application and DB schema
- Not portable across database engines
Interview position: In modern microservices architectures, stored procedures are generally avoided. Business logic belongs in the application layer where it can be versioned, tested, and deployed independently.
9. Schema Design
Q28. A junior engineer puts ON DELETE CASCADE on a orders → restaurants foreign key. What is wrong with this?
Accidentally deleting a restaurant row with a bad WHERE clause would silently cascade and delete every order ever associated with that restaurant — potentially millions of rows. Recovery requires a full backup restore.
Correct approach: Use ON DELETE RESTRICT. It blocks the parent delete if children exist, forcing the application to handle cleanup explicitly with full logging and control.
Q29. When would you use a soft delete vs a hard delete?
Soft delete — Set deleted_at = NOW(). Row stays in the table. Used when you need audit trails, data recovery, or referential integrity preservation.
Hard delete — Physically remove the row. Used for transient data (OTPs, sessions) and required for GDPR right-to-erasure compliance — soft delete alone does not satisfy GDPR.
Hidden danger of soft deletes: Every query must include WHERE deleted_at IS NULL. Miss it once and deleted data reappears. Mitigate with ORM-level global filters.
Q30. Design the schema for an order tracking system where you need to track every status change.
sql
QUBITS OF DPK
Both tables are written in a single transaction — if either write fails, both roll back. Never derive current state from the event log alone; ORDER BY + LIMIT 1 over millions of events is unacceptably slow.
10. Performance and Scaling
Q31. An orders table has 500M rows. Queries are slow despite indexes. What is your escalation path?
sql
QUBITS OF DPK
Never jump to sharding without exhausting every earlier step. Sharding is irreversible and operationally expensive.
Q32. What is query optimization? How does EXPLAIN work?
The query optimizer generates an execution plan — the sequence of operations (index scan, full scan, join order, etc.) the DB will use. EXPLAIN shows that plan.
sql
QUBITS OF DPK
Key columns to examine:
Q33. What is the N+1 query problem? How do you fix it?
The N+1 problem occurs when code runs 1 query to fetch N parent records, then runs 1 additional query per parent to fetch its children — total N+1 queries.
sql
QUBITS OF DPK
In ORMs: use .join(), fetch = EAGER, or @EntityGraph in JPA to force the join. Never lazy-load in a loop.
Q34. What is the difference between partitioning and sharding?
Partitioning — Splits one table into physical pieces on the same server. The application sees one table. Low complexity. Use when a single table exceeds ~100M rows.
Sharding — Splits data across multiple servers. Each shard is an independent database. Very high complexity — cross-shard joins are nearly impossible, distributed transactions are hard. Use only when a single server cannot handle the load.
Exhaust partitioning, read replicas, and caching before considering sharding.
11. OLAP vs OLTP
Q35. What is the difference between OLTP and OLAP systems?
Q36. A data analyst's query is causing p99 latency spikes on your production MySQL database. What do you do?
sql
QUBITS OF DPK
Never allow full-scan analytical queries on the OLTP primary. One query without a WHERE clause can take down production.
12. Schema Evolution
Q37. How do you rename a column in a production database with zero downtime?
Never do this directly — it breaks running application instances immediately:
sql
QUBITS OF DPK
Correct approach — Expand-Contract pattern:
sql
QUBITS OF DPK
Q38. How do you add a NOT NULL column to a billion-row table safely?
sql
QUBITS OF DPK
Never add NOT NULL DEFAULT value in one step on large tables — MySQL rewrites every row, locks the table for hours. Use pt-online-schema-change or gh-ost for large tables.
13. Multi-Tenancy
Q39. What are the three approaches to multi-tenant schema design? When do you use each?
Approach 1 — Shared Table
All tenants share the same tables. Every table has a tenant_id column.
Simple, cheap, one migration to maintain
️ Noisy neighbor risk. Data leak if WHERE tenant_id = ? is ever missing.
All tenants share the same tables. Every table has a tenant_id column.
Simple, cheap, one migration to maintain
️ Noisy neighbor risk. Data leak if WHERE tenant_id = ? is ever missing.
Approach 2 — Separate Schema Per Tenant
Each tenant gets their own schema on the same DB server.
Strong isolation, easy per-tenant backup
️ Migrations must run N times — one per tenant.
Each tenant gets their own schema on the same DB server.
Strong isolation, easy per-tenant backup
️ Migrations must run N times — one per tenant.
Approach 3 — Separate Database Per Tenant
Each tenant gets a dedicated server.
Maximum isolation, satisfies SOC2/HIPAA/GDPR auditors
️ Very expensive, complex to operate.
Each tenant gets a dedicated server.
Maximum isolation, satisfies SOC2/HIPAA/GDPR auditors
️ Very expensive, complex to operate.
Industry hybrid:
sql
QUBITS OF DPK