SQL 1 : Introduction to DBMS & Keys

Qubits of DPK

April 11, 2026

Core DBMS

1. Data is King — But How Do We Store It?

Every modern software system runs on data. Instagram stores users, posts, and likes. Amazon stores products, orders, and payments. Netflix stores users, movies, and watch history. Without data, an application simply cannot function — and as systems grow to serve millions of users, the question of how to store and manage that data safely and efficiently becomes one of the most critical engineering challenges you will face.

This is where databases come in.

2. The Old Approach — Storing Data in Files

Before databases existed, developers stored data in plain files. A simple system might have a customers.txt, an orders.txt, and a payments.txt, each containing comma-separated values like:

sql

QUBITS OF DPK

1customer_id,name,email
2101,John,john@gmail.com
3102,Alice,alice@gmail.com

This works perfectly fine for tiny systems, but it breaks down badly as applications scale. Understanding why it breaks down is important — these are exactly the problems that databases were invented to solve.

3. Problems With File Storage

Data Duplication is the first problem. The same data ends up copied across multiple files. John's name might appear in both customers.txt and orders.txt. The moment John changes his email address, every file containing that email must be updated. Miss even one file, and you now have conflicting information about the same person.

Data Inconsistency follows directly from duplication. After a partial update, customers.txt might show john_new@gmail.com while orders.txt still shows john_old@gmail.com. Now the system has two different versions of the truth about the same user — and there is no automatic way to know which one is correct.

Difficult Data Retrieval becomes a real problem the moment queries span multiple files. Answering something as straightforward as "show me all orders placed by customers from India" requires your code to read the customers file, filter by country, read the orders file, and manually match the IDs together. This is slow, error-prone, and gets exponentially harder as the number of files grows.

No Data Integrity is perhaps the most dangerous problem. Nothing in a file system prevents you from creating an order that references customer_id = 999 when no customer with that ID exists. The file will happily accept this invalid data, and your system will silently become corrupted.

4. What is a Database?

A database is an organized collection of structured data. Instead of scattered text files, data is stored in well-defined tables — think of each table like a spreadsheet with rows and columns, but with strict rules about what data is allowed and how tables relate to each other.

sql

QUBITS OF DPK

1Customers Table           Orders Table
2─────────────────         ──────────────────────
3customer_id | name        order_id | customer_id | amount

This structure is not just cosmetic — it enables the database software to answer complex queries in milliseconds, enforce data validity automatically, and handle thousands of simultaneous users without corrupting anything.

5. What is a DBMS?

A DBMS (Database Management System) is the software layer that sits between your application and the raw data. It is responsible for storing data to disk, retrieving it efficiently, managing relationships between tables, handling multiple users writing at the same time (concurrency), and enforcing rules that keep data valid and consistent.

The most widely used relational database systems are MySQL (by Oracle), PostgreSQL (open source), Oracle DB, and Microsoft SQL Server. For this learning path, we work with MySQL.

6. Relational vs NoSQL Databases

A Relational Database (RDBMS) stores data in tables that relate to each other through shared keys. The vast majority of backend systems — e-commerce platforms, banking systems, booking systems, SaaS products — are built on relational databases. SQL is the language used to query them.

NoSQL databases take a different approach, sacrificing some relational structure in exchange for horizontal scalability and flexibility. MongoDB stores data as JSON-like documents, Redis stores key-value pairs in memory, and Cassandra stores data in wide columns optimized for distributed systems. Companies like Netflix, Facebook, and Uber use NoSQL for specific high-scale components — but even they typically use relational databases for core transactional data.

For interviews and production backend engineering, relational databases and SQL are non-negotiable foundational knowledge.

7. What Are Keys?

Keys are how relational databases uniquely identify rows. Think of them the way you think of real-world identifiers — a passport number uniquely identifies a person internationally, an Aadhaar number uniquely identifies an Indian citizen, and an employee ID uniquely identifies someone within a company. Database keys serve exactly the same purpose: they give every row an unambiguous identity.

Consider this table:

Notice that employee_id, email, and phone could all potentially identify a row uniquely. This observation leads us to the different types of keys, which build on each other logically.

8. Super Key

A Super Key is any column or combination of columns that uniquely identifies a row. From the table above, all of the following are super keys:

sql

QUBITS OF DPK

1employee_id
2email
3phone
4employee_id + email
5employee_id + phone
6employee_id + email + phone

All of these would uniquely identify a row — but many of them carry redundant information. If employee_id alone uniquely identifies a row, then employee_id + email does too, but the email column adds nothing to the uniqueness. Super keys can be bloated with unnecessary columns.

9. Candidate Key

A Candidate Key is a minimal super key — one where you cannot remove any column without losing the ability to uniquely identify rows. From the example above, the candidate keys are:

sql

QUBITS OF DPK

1employee_id   ← removing this breaks uniqueness, so it's minimal
2email         ← removing this breaks uniqueness, so it's minimal
3phone         ← same reasoning

But employee_id + email is not a candidate key, because employee_id alone is already sufficient — the email is redundant. The key distinction is:

A table can have multiple candidate keys. In the example above, employee_id, email, and phone are all candidate keys — any one of them could serve as the definitive row identifier.

10. Primary Key

A Primary Key is the candidate key you choose as the official row identifier. You pick one from among the candidate keys and declare it as the primary key. The rules are strict: the value must be unique across every row, it can never be NULL, and there can only be one primary key per table.

sql

QUBITS OF DPK

1CREATE TABLE employees (
2    employee_id INT PRIMARY KEY,
3    name        VARCHAR(100)
4);

In practice, most production systems use an auto-incrementing integer as the primary key rather than a natural attribute like email — we will discuss why shortly.

11. Composite Key

Sometimes no single column uniquely identifies a row, and the combination of two columns together is what creates uniqueness. This is called a Composite Key.

The classic example is a student enrollment table. A student can enroll in many courses, and the same course can have many students — so neither student_id nor course_id alone is unique. But the combination of both is unique (a student can only enroll in the same course once):

sql

QUBITS OF DPK

1CREATE TABLE enrollments (
2    student_id INT,
3    course_id  INT,
4    PRIMARY KEY (student_id, course_id)
5);

Composite keys appear naturally in many-to-many relationships — a pattern you will encounter constantly in schema design.

12. Foreign Key

A Foreign Key is what creates a relationship between two tables. It is a column in one table that references the Primary Key of another table, and its purpose is to enforce referential integrity — meaning the database will reject any data that references a row that does not exist.

sql

QUBITS OF DPK

1CREATE TABLE orders (
2    order_id    INT PRIMARY KEY,
3    customer_id INT,
4    FOREIGN KEY (customer_id) REFERENCES customers(id)
5);

With this constraint in place, if you try to insert an order with customer_id = 999 and no customer with id 999 exists, MySQL will reject the insert entirely. Without foreign keys, your database can silently accumulate orphaned records that point to nothing — a data integrity nightmare that is very difficult to clean up later.

13. Unique Key

A Unique Key is similar to a Primary Key in that it enforces uniqueness, but with one important difference: it allows one NULL value. Primary Keys cannot be NULL at all. You use a Unique Key when a column must not have duplicates but is not the primary identifier of the row.

sql

QUBITS OF DPK

1email VARCHAR(100) UNIQUE

This ensures no two users can share the same email address, while still allowing a row where email is not yet known.

14. Surrogate Keys vs Natural Keys

This distinction comes up often in interviews and in real schema design discussions.

A Natural Key is a real-world identifier that already exists in your domain — a passport number, an Aadhaar number, an email address, a PAN number. These have meaningful business value outside the database.

A Surrogate Key is an artificial identifier that exists purely for the database's use — typically an auto-incrementing integer with no real-world meaning:

sql

QUBITS OF DPK

1id INT AUTO_INCREMENT PRIMARY KEY

Most production systems strongly prefer surrogate keys. Natural keys can change (people change email addresses, countries reissue passport numbers), and they can be long strings that make joins expensive. A simple integer id is immutable, compact, and fast to index and join on. The email or passport number can still be stored in the table as a Unique Key — it just is not used as the primary identifier.

15. Interview Questions

What problems do databases solve compared to file storage? File systems suffer from data redundancy (the same data duplicated across files), data inconsistency (different files having different versions of the same information), lack of data integrity (nothing prevents invalid references), and complex retrieval (you have to write custom code to join data across files). A DBMS solves all of these through structured tables, SQL queries, constraints, and enforced relationships.

What is the difference between a Super Key and a Candidate Key? A Super Key is any set of columns that uniquely identifies a row — it can include extra, unnecessary columns. A Candidate Key is a minimal Super Key — every column in it is necessary for uniqueness, and removing any one column would break that uniqueness. All Candidate Keys are Super Keys, but not all Super Keys are Candidate Keys.

Can a table have multiple Candidate Keys? Yes. If employee_id, email, and phone all independently uniquely identify rows, then all three are Candidate Keys. The database designer then picks one of them to serve as the Primary Key.

16. Quick Revision

sql

QUBITS OF DPK

1Database      → organized, structured data storage
2DBMS          → software that manages the database
3RDBMS         → relational database — data in linked tables
4SQL           → language to query a relational database
5
6Super Key     → any column(s) that uniquely identify a row
7Candidate Key → minimal super key (no redundant columns)
8Primary Key   → the chosen candidate key for a table
9Composite Key → primary key made of two or more columns
10Foreign Key   → enforces relationships and referential integrity
11Unique Key    → uniqueness without the NOT NULL requirement of PK
12Surrogate Key → artificial auto-increment ID (preferred in production)
13Natural Key   → real-world identifier (email, passport number, etc.)