SQL 6 : Joins and Aggregate Queries

Qubits of DPK

April 11, 2026

Core DBMS

1. Recap of SQL 5

Previously we covered Joins (INNER JOIN, LEFT JOIN, RIGHT JOIN, SELF JOIN) for combining tables, aggregate functions (COUNT, SUM, AVG, MIN, MAX) for summarizing data, GROUP BY for aggregating per category, and HAVING for filtering grouped results.

2. Real Problem Solving Strategy

When you face a SQL problem involving joins and aggregation, follow this five-step mental framework rather than jumping straight to writing a query.

Step 1 — Identify the tables involved in the problem (e.g. customers, orders).

Step 2 — Identify the relationship between them (e.g. orders.customer_id → customers.id).

Step 3 — Join the tables on the correct key.

Step 4 — Apply the aggregation you need (e.g. SUM(order_amount)).

Step 5 — Group the results by the right column (e.g. GROUP BY customers.id).

This structured approach prevents the most common mistakes before you even write a line of SQL.

3. Problem 1 — Total Order Amount Per Customer

Tables:

Customers

Orders

Solution:

sql

QUBITS OF DPK

1SELECT
2    c.name,
3    SUM(o.amount) AS total_spent
4FROM customers c
5JOIN orders o ON c.id = o.customer_id
6GROUP BY c.name;

Result:

4. Problem 2 — Customers With More Than 2 Orders

This is where HAVING becomes essential. You cannot filter on an aggregate result using WHERE — HAVING was designed specifically for this.

sql

QUBITS OF DPK

1SELECT
2    customer_id,
3    COUNT(order_id) AS order_count
4FROM orders
5GROUP BY customer_id
6HAVING COUNT(order_id) > 2;

The query groups all orders by customer, counts them, then filters out anyone with 2 or fewer orders.

5. Problem 3 — Department With the Highest Average Salary

Employees table:

sql

QUBITS OF DPK

1SELECT
2    department,
3    AVG(salary) AS avg_salary
4FROM employees
5GROUP BY department
6ORDER BY avg_salary DESC
7LIMIT 1;

ORDER BY ... DESC sorts departments from highest to lowest average salary, and LIMIT 1 picks the top result. Simple and efficient.

6. Problem 4 — Employee Count Per Department

sql

QUBITS OF DPK

1SELECT
2    department,
3    COUNT(*) AS employee_count
4FROM employees
5GROUP BY department;

COUNT(*) counts all rows in each group, including those with NULL values — which is usually what you want when counting headcount.

7. Problem 5 — Total Sales Per Product (Join + Aggregation)

Tables:

Products

Sales

sql

QUBITS OF DPK

1SELECT
2    p.name,
3    SUM(s.amount) AS total_sales
4FROM products p
5JOIN sales s ON p.product_id = s.product_id
6GROUP BY p.name;

Result:

8. Problem 6 — All Departments Including Those With Zero Employees

This is a classic LEFT JOIN use case. A regular INNER JOIN would silently drop departments with no employees — which is wrong when the business wants a complete picture.

sql

QUBITS OF DPK

1SELECT
2    d.department_name,
3    COUNT(e.employee_id) AS employee_count
4FROM departments d
5LEFT JOIN employees e ON d.department_id = e.department_id
6GROUP BY d.department_name;

LEFT JOIN keeps all rows from the departments table even when there is no matching row in employees. For empty departments, COUNT(e.employee_id) returns 0 because it counts non-NULL values only — which is exactly the right behaviour here.

9. Problem 7 — Top 3 Customers by Total Spending

sql

QUBITS OF DPK

1SELECT
2    customer_id,
3    SUM(amount) AS total_spent
4FROM orders
5GROUP BY customer_id
6ORDER BY total_spent DESC
7LIMIT 3;

Aggregate first, order second, then limit. This pattern — group → sort → limit — is the foundation of almost every "top N" query you will encounter in interviews.

10. Common Mistakes to Avoid

Mistake 1 — Missing GROUP BY. If you use an aggregate function alongside a non-aggregated column, you must group by that column. Without GROUP BY, the database doesn't know how to collapse the rows.

sql

QUBITS OF DPK

1-- Wrong
2SELECT customer_id, SUM(amount) FROM orders;
3
4-- Correct
5SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id;

Mistake 2 — Using WHERE instead of HAVING for aggregate conditions. WHERE filters rows before grouping. HAVING filters groups after aggregation. Using WHERE with an aggregate function will throw an error.

sql

QUBITS OF DPK

1-- Wrong
2SELECT department, AVG(salary)
3FROM employees
4WHERE AVG(salary) > 60000
5GROUP BY department;
6
7-- Correct
8SELECT department, AVG(salary)
9FROM employees
10GROUP BY department
11HAVING AVG(salary) > 60000;

Mistake 3 — Wrong join condition. Always join on the foreign key relationship, not on coincidental column name matches.

sql

QUBITS OF DPK

1-- Wrong
2ON customers.id = orders.id
3
4-- Correct
5ON customers.id = orders.customer_id

11. Real Production Example — Amazon Revenue Per Category

In a real e-commerce system, revenue analysis typically spans multiple joined tables. Here is how you would calculate revenue per product category:

sql

QUBITS OF DPK

1SELECT
2    c.category_name,
3    SUM(oi.quantity * oi.price) AS revenue
4FROM order_items oi
5JOIN products p ON oi.product_id = p.product_id
6JOIN categories c ON p.category_id = c.category_id
7GROUP BY c.category_name;

Notice the chain of joins — order_items connects to products, which connects to categories. This multi-level join is very common in production analytics queries.

12. Interview-Level Query — Customers Who Spent More Than Average

This is a classic MAANG-style problem that combines aggregation with a subquery. The inner subquery calculates each customer's total spend, the middle layer averages those totals, and the outer query filters customers who exceed that average.

sql

QUBITS OF DPK

1SELECT
2    customer_id,
3    SUM(amount) AS total_spent
4FROM orders
5GROUP BY customer_id
6HAVING SUM(amount) > (
7    SELECT AVG(total_spent)
8    FROM (
9        SELECT SUM(amount) AS total_spent
10        FROM orders
11        GROUP BY customer_id
12    ) t
13);

The key insight here is that you cannot directly compare a grouped result to an average of grouped results in one pass — you need the subquery to pre-aggregate, then average that result.

13. Quick Revision

sql

QUBITS OF DPK

1JOIN          → combine rows from multiple tables
2
3Aggregates:
4  COUNT()     → count rows
5  SUM()       → total
6  AVG()       → average
7  MIN/MAX()   → range
8
9GROUP BY      → collapse rows into groups
10HAVING        → filter groups after aggregation (not WHERE)
11ORDER BY      → sort results
12LIMIT         → restrict number of rows returned
13
14Top-N pattern → GROUP BY → ORDER BY DESC → LIMIT N