SQL 8 : Indexing

Qubits of DPK

April 11, 2026

Core DBMS

1. Recap of SQL 7

Previously we learned subqueries, correlated subqueries, EXISTS / NOT EXISTS, IN / ANY / ALL, and Views. Now we move to something completely different. Instead of writing queries, we learn how to make queries faster. That concept is called Indexing.

2. Why Indexing Exists

Imagine a table with 10 rows — finding a record is easy. But what about 50 million rows?

sql

QUBITS OF DPK

1SELECT *
2FROM transactions
3WHERE transaction_id = 9834729;

Without indexing, the database must check every single row from 1 to 50,000,000. This is called a Full Table Scan, and it is very slow.

3. Real World Analogy

Think about a book. If you want to find the word "Database", you could read every page from the beginning — or you could check the index page:

sql

QUBITS OF DPK

1Database → Page 342

You immediately jump to the right page. Database indexes work exactly like this.

4. What is an Index?

An index is a data structure that helps the database locate rows faster. If we frequently search by email, we create an index on that column:

sql

QUBITS OF DPK

1CREATE INDEX idx_email
2ON users(email);

Now the database can locate email values without scanning the entire table.

5. Full Table Scan vs Index Lookup

Without an index, a query like this:

sql

QUBITS OF DPK

1SELECT *
2FROM users
3WHERE email = 'john@gmail.com';

forces the database to scan every row — time complexity O(N).

With an index, the database looks into the index structure first and jumps directly to the matching row — time complexity O(log N). That is a massive performance improvement at scale.

6. Data Structures Used for Indexing

Most relational databases use B+ Trees as their index structure. Some databases also support hash indexes, but B+ Trees are by far the most common.

Why not use Hash Maps? Hash maps allow O(1) lookups, which sounds faster. However, they cannot support range queries, sorting, or partial scans. A query like:

sql

QUBITS OF DPK

1SELECT *
2FROM orders
3WHERE amount BETWEEN 100 AND 500;

cannot be served efficiently by a hash index. B+ Trees handle this naturally because values are stored in sorted order.

7. How B+ Trees Work

A B+ Tree organizes values in a sorted tree structure:

sql

QUBITS OF DPK

1        40
2      /    \
3   20       60
4  /  \     /  \
510   30   50   70

Searching for 50 takes only a few steps — go right at 40, go left at 60, found at 50. This is why B+ Trees scale well even to billions of rows.

8. Clustered vs Non-Clustered Index

A clustered index determines the physical order of rows on disk. Most databases automatically create a clustered index on the primary key, so rows are stored in primary key order. There can only be one clustered index per table.

A non-clustered index is a separate structure that stores pointers back to the actual rows:

sql

QUBITS OF DPK

1Email Index
2
3alice@gmail.com → row 10
4bob@gmail.com   → row 5
5john@gmail.com  → row 1

You can have many non-clustered indexes on a single table.

9. Composite Index

When queries filter on multiple columns together, a composite index covers them in one structure:

sql

QUBITS OF DPK

1CREATE INDEX idx_customer_date
2ON orders(customer_id, order_date);

The Left Prefix Rule — this is critical to understand. The index (customer_id, order_date) works for queries filtering on customer_id alone, or customer_id + order_date together. It does not work efficiently for order_date alone, because the left prefix is missing.

sql

QUBITS OF DPK

1WHERE customer_id = 5                              -- uses index
2WHERE customer_id = 5 AND order_date = '2024-01-01' -- uses index
3WHERE order_date = '2024-01-01'                    -- does NOT use index

10. Multi-Column Index Example

sql

QUBITS OF DPK

1CREATE INDEX idx_user_status_date
2ON orders(user_id, status, created_at);

This optimizes all of the following queries because they all begin from the leftmost column:

sql

QUBITS OF DPK

1WHERE user_id = 5
2WHERE user_id = 5 AND status = 'PAID'
3WHERE user_id = 5 AND status = 'PAID' AND created_at > '2024-01-01'

11. Downsides of Indexing

Indexes are powerful but come with real trade-offs that matter in production.

Extra storage — indexes consume significant disk space. A 100 GB table may have 40 GB of associated indexes on top of it.

Slower writes — every INSERT, UPDATE, or DELETE must also update all relevant indexes. The more indexes you have, the heavier each write operation becomes.

Maintenance overhead — too many indexes slow down writes noticeably and increase the complexity of database maintenance.

12. Indexing Strings

Indexing very long text columns is inefficient. For a column like product_description TEXT, prefix indexing is a better approach — index only the first N characters:

sql

QUBITS OF DPK

1CREATE INDEX idx_title
2ON articles(title(20));

This indexes only the first 20 characters of the title, which is usually enough to narrow down results significantly.

13. When Indexes Are Not Useful

Not every column is a good index candidate. Consider this query:

sql

QUBITS OF DPK

1SELECT COUNT(*)
2FROM transactions
3WHERE transaction_type = 'CREDIT';

If 95% of rows are CREDIT, the database still has to scan most of the table even with an index — so the index provides no real benefit.

The key concept here is selectivity — the ratio of unique values in a column. High selectivity columns like email or user_id are excellent candidates. Low selectivity columns like gender, status, or boolean flags generally are not.

14. How to Identify Good Index Candidates

Columns that frequently appear in WHERE, JOIN, ORDER BY, or GROUP BY clauses are strong candidates. For example:

sql

QUBITS OF DPK

1SELECT *
2FROM orders
3WHERE customer_id = 10;

sql

QUBITS OF DPK

1CREATE INDEX idx_customer_id
2ON orders(customer_id);

15. Interview Questions

Why not index every column? Because indexes increase storage usage, write latency, and maintenance cost. Each write must update every index on the table.

When is indexing not useful? When the column has low selectivity — meaning most rows share the same value, so the database ends up scanning most of the table anyway.

What is the Left Prefix Rule? A composite index (A, B, C) can only be used by queries that filter starting from the leftmost column. A query filtering only on B or C will not benefit from this index.

16. Quick Revision

sql

QUBITS OF DPK

1Index             → improves query speed by avoiding full table scans
2Full table scan   → O(N) — checks every row
3Indexed lookup    → O(log N) — follows B+ Tree path
4
5B+ Tree           → sorted tree, supports range queries and sorting
6Hash index        → O(1) lookup but no range query support
7
8Clustered index   → determines physical row order (one per table)
9Non-clustered     → separate pointer structure (many per table)
10
11Composite index   → covers multiple columns in one index
12Left Prefix Rule  → index must be used from the leftmost column
13Selectivity       → ratio of unique values — high = good index candidate