SQL 7 : Subqueries & Views

D

Qubits of DPK

April 11, 2026

Core DBMS

Quick Recap (SQL 6)

You learned JOINs + GROUP BY + HAVING.
sql
QUBITS OF DPK
1SELECT department, AVG(salary)
2FROM employees
3GROUP BY department
4HAVING AVG(salary) > 60000;
Problem: What if the filter condition itself needs to be computed first? That's where subqueries come in.

1. What Is a Subquery?

Layman: A subquery is a question inside a question.
sql
QUBITS OF DPK
1SELECT name, salary
2FROM employees
3WHERE salary > (
4    SELECT AVG(salary)
5    FROM employees
6);
Execution order:
  1. #
    Inner query runs → returns 60000
  2. #
    Outer query uses that: WHERE salary > 60000

2. Types of Subqueries — The Full Map

3. Subquery in WHERE (Scalar Subquery)

Use when: you need one computed value to filter against.
Real-world: Spotify filters songs with above-average stream counts for a featured playlist.
sql
QUBITS OF DPK
1SELECT name, salary
2FROM employees
3WHERE salary > (
4    SELECT AVG(salary)        -- runs first, returns one number
5    FROM employees
6);
Build-up:
sql
QUBITS OF DPK
1-- Step 1: What's the average?
2SELECT AVG(salary) FROM employees;
3-- Result: 60000
4
5-- Step 2: Filter using that value
6SELECT name, salary
7FROM employees
8WHERE salary > 60000;          -- subquery replaces the hardcoded value

4. Subquery with IN (Multi-Value)

Use when: the inner query returns a list of values.
Real-world: Amazon shows orders only from warehouses located in a specific region.
sql
QUBITS OF DPK
1-- Problem: Find employees in Seattle departments
2
3-- Step 1: Which departments are in Seattle?
4SELECT department_id
5FROM departments
6WHERE location = 'Seattle';
7-- Returns: [3, 7, 12]
8
9-- Step 2: Find employees in those departments
10SELECT employee_id, first_name
11FROM employees
12WHERE department_id IN (
13    SELECT department_id
14    FROM departments
15    WHERE location = 'Seattle'
16);
Interview tip: IN is clean and readable. But if the subquery returns thousands of rows, EXISTS is faster. Know the trade-off.

5. Subquery in FROM (Derived Table)

Use when: you need to aggregate first, then filter on that aggregation.
Why can't you just use HAVING?
HAVING filters groups. But what if you want to treat the grouped result as a table and do more operations on it? That's when you move the aggregation into a FROM subquery.
Real-world: A BI dashboard at Swiggy computes per-restaurant average order value, then surfaces only the top performers.
sql
QUBITS OF DPK
1-- Problem: Departments where avg salary > 60000
2
3-- Step 1: Create a "virtual table" of dept averages
4SELECT department_id, AVG(salary) AS avg_salary
5FROM employees
6GROUP BY department_id;
7
8-- Step 2: Filter that virtual table
9SELECT *
10FROM (
11    SELECT department_id, AVG(salary) AS avg_salary
12    FROM employees
13    GROUP BY department_id
14) dept_salary                        -- alias is MANDATORY here
15WHERE avg_salary > 60000;
Build-up visual:
Outer query filters on this virtual table.

6. Correlated Subquery

The most interview-heavy subquery type.
Layman: A correlated subquery is like an employee review process where each employee is compared specifically to their own team — not the whole company.
Real-world: Flipkart highlights sellers whose product prices are below the average for their own category.
sql
QUBITS OF DPK
1-- Problem: Employees earning more than their own department's average
2
3SELECT e.name, e.salary, e.department
4FROM employees e
5WHERE e.salary > (
6    SELECT AVG(salary)
7    FROM employees
8    WHERE department = e.department    -- ← references outer query's row
9);
Execution trace (for 3 employees):
sql
QUBITS OF DPK
1Row 1: e.department = 'Engineering'
2Inner: SELECT AVG(salary) FROM employees WHERE department = 'Engineering'
3Returns: 75000
4Check: Is 90000 > 75000? YES → include
5
6Row 2: e.department = 'Marketing'
7Inner: SELECT AVG(salary) FROM employees WHERE department = 'Marketing'
8Returns: 50000
9Check: Is 45000 > 50000? NO → exclude
10
11Row 3: e.department = 'Engineering'
12Inner runs again for Engineering → Returns: 75000
13Check: Is 70000 > 75000? NO → exclude
Performance Warning: Correlated subqueries run N times (once per outer row). At scale, this becomes expensive. For MAANG interviews — always mention this and suggest replacing with a window function when possible.
Optimized alternative using window function:
sql
QUBITS OF DPK
1SELECT name, salary, department
2FROM (
3    SELECT name, salary, department,
4           AVG(salary) OVER (PARTITION BY department) AS dept_avg
5    FROM employees
6) ranked
7WHERE salary > dept_avg;

7. EXISTS and NOT EXISTS

EXISTS = "Does at least one matching row exist?"
Layman: Think of it like a bouncer checking a guest list. The bouncer doesn't need to read the whole list — they just need to find your name once and wave you in.
SQL stops scanning the moment it finds the first matching row. That's why EXISTS is fast.
Real-world: Netflix checks if a user has watched anything before showing a "Continue Watching" section.
sql
QUBITS OF DPK
1-- Find customers who placed at least one order
2SELECT *
3FROM customers c
4WHERE EXISTS (
5    SELECT 1                          -- the value doesn't matter; only existence matters
6    FROM orders o
7    WHERE o.customer_id = c.customer_id
8);
Why SELECT 1? EXISTS only cares whether rows exist — not what they contain. SELECT 1 is a convention that signals "I'm only checking existence, not fetching data."
NOT EXISTS = opposite — find rows with no matching record.
Real-world: Zomato identifies users who signed up but never placed an order (for a re-engagement campaign).
sql
QUBITS OF DPK
1SELECT *
2FROM customers c
3WHERE NOT EXISTS (
4    SELECT 1
5    FROM orders o
6    WHERE o.customer_id = c.customer_id
7);

EXISTS vs IN — Know the Difference

8. ANY and ALL

Both compare a value against a set returned by a subquery.
ANY = "Is my value greater than at least one value in the set?"
sql
QUBITS OF DPK
1SELECT name
2FROM employees
3WHERE salary > ANY (
4    SELECT salary
5    FROM employees
6    WHERE department = 'Marketing'
7);
8-- Passes if salary > the MINIMUM marketing salary
ALL = "Is my value greater than every value in the set?"
sql
QUBITS OF DPK
1SELECT name
2FROM employees
3WHERE salary > ALL (
4    SELECT salary
5    FROM employees
6    WHERE department = 'Marketing'
7);
8-- Passes if salary > the MAXIMUM marketing salary
Memory trick: ANY = pass the easiest bar. ALL = pass the hardest bar.

9. Views — Virtual Tables

Layman: A view is like a saved search on your phone. You don't store the results — you store the query. Every time you open it, it re-runs and gives you fresh results.
sql
QUBITS OF DPK
1-- Create a view once
2CREATE VIEW high_salary_employees AS
3SELECT name, salary
4FROM employees
5WHERE salary > 60000;
6
7-- Query it like a regular table
8SELECT * FROM high_salary_employees;
Why views exist in production:
Real-world: At Paytm, the analytics team queries a active_users_monthly view. The underlying join across 4 tables is hidden inside the view — analysts just write SELECT * FROM active_users_monthly WHERE region = 'South'.

30-Second Interview Explanation — Subqueries

"A subquery is a query nested inside another query. The inner query runs first, and its result is used by the outer query. There are several types: scalar subqueries return a single value used in WHERE, IN subqueries return a list, FROM subqueries create a derived temporary table, and correlated subqueries reference the outer query's current row and re-execute for each row. EXISTS is a boolean check — it stops at the first match, making it faster than IN for large result sets. When a correlated subquery gets expensive, I replace it with a window function, which processes all rows in a single pass."

Mental Model Summary

sql
QUBITS OF DPK
1Subquery Types:
2├── WHERE (scalar)      → one value filter          → most common
3├── IN                  → list filter                → watch for NULLs
4├── FROM (derived)      → aggregate then filter      → when HAVING isn't enough
5├── Correlated          → per-row computation        → replace with window fn at scale
6├── EXISTSboolean existence check    → faster than IN for large sets
7└── ANY / ALLset comparison             → ANY=min bar, ALL=max bar
8
9Views:
10└── Saved query → virtual tableno data stored → re-executes on every call