SQL 1 : Introduction to DBMS & Keys
D
Qubits of DPK
April 11, 2026
Core DBMS
1. Data is King — But How Do We Store It?
Every modern software system runs on data. Instagram stores users, posts, and likes. Amazon stores products, orders, and payments. Netflix stores users, movies, and watch history. Without data, an application simply cannot function — and as systems grow to serve millions of users, the question of how to store and manage that data safely and efficiently becomes one of the most critical engineering challenges you will face.
This is where databases come in.
2. The Old Approach — Storing Data in Files
Before databases existed, developers stored data in plain files. A simple system might have a customers.txt, an orders.txt, and a payments.txt, each containing comma-separated values like:
sql
QUBITS OF DPK
This works perfectly fine for tiny systems, but it breaks down badly as applications scale. Understanding why it breaks down is important — these are exactly the problems that databases were invented to solve.
3. Problems With File Storage
Data Duplication is the first problem. The same data ends up copied across multiple files. John's name might appear in both customers.txt and orders.txt. The moment John changes his email address, every file containing that email must be updated. Miss even one file, and you now have conflicting information about the same person.
Data Inconsistency follows directly from duplication. After a partial update, customers.txt might show john_new@gmail.com while orders.txt still shows john_old@gmail.com. Now the system has two different versions of the truth about the same user — and there is no automatic way to know which one is correct.
Difficult Data Retrieval becomes a real problem the moment queries span multiple files. Answering something as straightforward as "show me all orders placed by customers from India" requires your code to read the customers file, filter by country, read the orders file, and manually match the IDs together. This is slow, error-prone, and gets exponentially harder as the number of files grows.
No Data Integrity is perhaps the most dangerous problem. Nothing in a file system prevents you from creating an order that references customer_id = 999 when no customer with that ID exists. The file will happily accept this invalid data, and your system will silently become corrupted.
4. What is a Database?
A database is an organized collection of structured data. Instead of scattered text files, data is stored in well-defined tables — think of each table like a spreadsheet with rows and columns, but with strict rules about what data is allowed and how tables relate to each other.
sql
QUBITS OF DPK
This structure is not just cosmetic — it enables the database software to answer complex queries in milliseconds, enforce data validity automatically, and handle thousands of simultaneous users without corrupting anything.
5. What is a DBMS?
A DBMS (Database Management System) is the software layer that sits between your application and the raw data. It is responsible for storing data to disk, retrieving it efficiently, managing relationships between tables, handling multiple users writing at the same time (concurrency), and enforcing rules that keep data valid and consistent.
The most widely used relational database systems are MySQL (by Oracle), PostgreSQL (open source), Oracle DB, and Microsoft SQL Server. For this learning path, we work with MySQL.
6. Relational vs NoSQL Databases
A Relational Database (RDBMS) stores data in tables that relate to each other through shared keys. The vast majority of backend systems — e-commerce platforms, banking systems, booking systems, SaaS products — are built on relational databases. SQL is the language used to query them.
NoSQL databases take a different approach, sacrificing some relational structure in exchange for horizontal scalability and flexibility. MongoDB stores data as JSON-like documents, Redis stores key-value pairs in memory, and Cassandra stores data in wide columns optimized for distributed systems. Companies like Netflix, Facebook, and Uber use NoSQL for specific high-scale components — but even they typically use relational databases for core transactional data.
For interviews and production backend engineering, relational databases and SQL are non-negotiable foundational knowledge.
7. What Are Keys?
Keys are how relational databases uniquely identify rows. Think of them the way you think of real-world identifiers — a passport number uniquely identifies a person internationally, an Aadhaar number uniquely identifies an Indian citizen, and an employee ID uniquely identifies someone within a company. Database keys serve exactly the same purpose: they give every row an unambiguous identity.
Consider this table:
Notice that employee_id, email, and phone could all potentially identify a row uniquely. This observation leads us to the different types of keys, which build on each other logically.
8. Super Key
A Super Key is any column or combination of columns that uniquely identifies a row. From the table above, all of the following are super keys:
sql
QUBITS OF DPK
All of these would uniquely identify a row — but many of them carry redundant information. If employee_id alone uniquely identifies a row, then employee_id + email does too, but the email column adds nothing to the uniqueness. Super keys can be bloated with unnecessary columns.
9. Candidate Key
A Candidate Key is a minimal super key — one where you cannot remove any column without losing the ability to uniquely identify rows. From the example above, the candidate keys are:
sql
QUBITS OF DPK
But employee_id + email is not a candidate key, because employee_id alone is already sufficient — the email is redundant. The key distinction is:
A table can have multiple candidate keys. In the example above, employee_id, email, and phone are all candidate keys — any one of them could serve as the definitive row identifier.
10. Primary Key
A Primary Key is the candidate key you choose as the official row identifier. You pick one from among the candidate keys and declare it as the primary key. The rules are strict: the value must be unique across every row, it can never be NULL, and there can only be one primary key per table.
sql
QUBITS OF DPK
In practice, most production systems use an auto-incrementing integer as the primary key rather than a natural attribute like email — we will discuss why shortly.
11. Composite Key
Sometimes no single column uniquely identifies a row, and the combination of two columns together is what creates uniqueness. This is called a Composite Key.
The classic example is a student enrollment table. A student can enroll in many courses, and the same course can have many students — so neither student_id nor course_id alone is unique. But the combination of both is unique (a student can only enroll in the same course once):
sql
QUBITS OF DPK
Composite keys appear naturally in many-to-many relationships — a pattern you will encounter constantly in schema design.
12. Foreign Key
A Foreign Key is what creates a relationship between two tables. It is a column in one table that references the Primary Key of another table, and its purpose is to enforce referential integrity — meaning the database will reject any data that references a row that does not exist.
sql
QUBITS OF DPK
With this constraint in place, if you try to insert an order with customer_id = 999 and no customer with id 999 exists, MySQL will reject the insert entirely. Without foreign keys, your database can silently accumulate orphaned records that point to nothing — a data integrity nightmare that is very difficult to clean up later.
13. Unique Key
A Unique Key is similar to a Primary Key in that it enforces uniqueness, but with one important difference: it allows one NULL value. Primary Keys cannot be NULL at all. You use a Unique Key when a column must not have duplicates but is not the primary identifier of the row.
sql
QUBITS OF DPK
This ensures no two users can share the same email address, while still allowing a row where email is not yet known.
14. Surrogate Keys vs Natural Keys
This distinction comes up often in interviews and in real schema design discussions.
A Natural Key is a real-world identifier that already exists in your domain — a passport number, an Aadhaar number, an email address, a PAN number. These have meaningful business value outside the database.
A Surrogate Key is an artificial identifier that exists purely for the database's use — typically an auto-incrementing integer with no real-world meaning:
sql
QUBITS OF DPK
Most production systems strongly prefer surrogate keys. Natural keys can change (people change email addresses, countries reissue passport numbers), and they can be long strings that make joins expensive. A simple integer id is immutable, compact, and fast to index and join on. The email or passport number can still be stored in the table as a Unique Key — it just is not used as the primary identifier.
15. Interview Questions
What problems do databases solve compared to file storage? File systems suffer from data redundancy (the same data duplicated across files), data inconsistency (different files having different versions of the same information), lack of data integrity (nothing prevents invalid references), and complex retrieval (you have to write custom code to join data across files). A DBMS solves all of these through structured tables, SQL queries, constraints, and enforced relationships.
What is the difference between a Super Key and a Candidate Key? A Super Key is any set of columns that uniquely identifies a row — it can include extra, unnecessary columns. A Candidate Key is a minimal Super Key — every column in it is necessary for uniqueness, and removing any one column would break that uniqueness. All Candidate Keys are Super Keys, but not all Super Keys are Candidate Keys.
Can a table have multiple Candidate Keys? Yes. If employee_id, email, and phone all independently uniquely identify rows, then all three are Candidate Keys. The database designer then picks one of them to serve as the Primary Key.
16. Quick Revision
sql
QUBITS OF DPK