SQL 5 : Aggregate Queries

Qubits of DPK

April 11, 2026

Core DBMS

1. Recap of SQL 4

Previously we learned Joins, which combine data from multiple tables using a shared key. Now we shift focus from combining data to analyzing it — specifically, how to summarize large sets of rows into meaningful insights using Aggregate Functions.

2. What Are Aggregate Functions?

An aggregate function takes multiple rows as input and collapses them into a single result value. Think of it as a compression — you feed in many rows, and the function distills them into one number that tells you something meaningful about the group.

sql

QUBITS OF DPK

1SELECT AVG(salary)
2FROM employees;
3-- Many rows → one result: 60000

This is the core idea behind all analytics queries. Rather than reading every row individually, you ask the database to summarize them for you.

3. The Five Core Aggregate Functions

4. COUNT()

COUNT(*) counts every row in the table, including those with NULL values in some columns.

sql

QUBITS OF DPK

1SELECT COUNT(*)
2FROM employees;
3-- Returns: 3

COUNT(column) is subtly different and important to understand — it counts only non-NULL values in that specific column. If a column has NULLs, those rows are silently skipped.

sql

QUBITS OF DPK

1SELECT COUNT(manager)
2FROM employees;
3-- Returns: 2  (because one employee has NULL manager)

This distinction matters in interviews. Always ask yourself: do you want to count rows, or count populated values in a specific column?

5. SUM() and AVG()

SUM() adds all values in a column together. AVG() divides that total by the count of non-NULL rows.

sql

QUBITS OF DPK

1SELECT SUM(salary), AVG(salary)
2FROM employees;
3-- Returns: 180000 | 60000

One thing to watch: AVG() ignores NULL values in its calculation, which can sometimes produce a higher average than you expect if many rows are NULL.

6. MIN() and MAX()

These find the boundary values in a column — the smallest and largest.

sql

QUBITS OF DPK

1SELECT MIN(salary), MAX(salary)
2FROM employees;
3-- Returns: 50000 | 70000

These work on numbers, dates, and strings alike. On strings, MIN and MAX follow alphabetical ordering.

7. GROUP BY

Aggregate functions become truly powerful when combined with GROUP BY. Instead of summarizing the entire table into one row, GROUP BY lets you summarize per category — one result row per group.

sql

QUBITS OF DPK

1SELECT department, AVG(salary)
2FROM employees
3GROUP BY department;

Think of it this way: GROUP BY department splits the table into separate buckets — one for Engineering, one for Marketing — and then runs the aggregate function independently inside each bucket.

The golden rule of GROUP BY — every column in your SELECT clause must either appear in the GROUP BY clause, or be wrapped inside an aggregate function. If you try to select salary alongside GROUP BY department, the database has no idea which salary to show for the group (there are many). It will either throw an error or return unpredictable results.

sql

QUBITS OF DPK

1-- Wrong: database doesn't know which salary to pick
2SELECT department, salary FROM employees GROUP BY department;
3
4-- Correct: tell the database HOW to reduce salaries to one value
5SELECT department, AVG(salary) FROM employees GROUP BY department;

8. HAVING

HAVING is to groups what WHERE is to rows. It lets you filter after grouping — keeping only the groups that meet a certain condition.

sql

QUBITS OF DPK

1SELECT department, AVG(salary)
2FROM employees
3GROUP BY department
4HAVING AVG(salary) > 65000;

You cannot use WHERE AVG(salary) > 65000 here because WHERE runs before grouping — the aggregate values don't exist yet at that point. HAVING runs after grouping, when those computed values are available.

You can use both in the same query. WHERE narrows down the rows first, then GROUP BY groups what remains, then HAVING filters those groups.

sql

QUBITS OF DPK

1SELECT department, AVG(salary)
2FROM employees
3WHERE salary > 50000        -- filter rows first
4GROUP BY department
5HAVING AVG(salary) > 65000; -- then filter groups

9. Cross Join

A cross join returns every possible combination of rows from two tables. If Table A has 2 rows and Table B has 2 rows, the result has 4 rows.

sql

QUBITS OF DPK

1SELECT *
2FROM colors
3CROSS JOIN sizes;

Cross joins are rarely used in day-to-day queries, but they appear in scenarios like generating all possible combinations of product variants (size × color × material).

10. Implicit Join (Older Syntax)

Before the JOIN keyword was standardized, tables were joined by listing them in FROM and filtering with WHERE. You may see this in legacy codebases.

sql

QUBITS OF DPK

1-- Old implicit syntax
2SELECT * FROM customers, orders
3WHERE customers.id = orders.customer_id;
4
5-- Modern explicit syntax (preferred)
6SELECT * FROM customers
7JOIN orders ON customers.id = orders.customer_id;

Always use the explicit JOIN syntax in production and interviews. It is clearer, easier to read, and makes the intent of the join obvious.

11. ON vs WHERE in Joins

These two clauses serve different purposes and should not be confused.

sql

QUBITS OF DPK

1SELECT *
2FROM customers
3JOIN orders ON customers.id = orders.customer_id  -- defines the join
4WHERE orders.amount > 500;                         -- filters the result

ON defines the relationship between the two tables. WHERE filters the result of the join. Mixing them up — especially with LEFT JOIN — can produce subtly wrong results that are difficult to debug.

12. Real Production Example — E-commerce Analytics

Finding total spending per customer is one of the most common analytics queries in any e-commerce system:

sql

QUBITS OF DPK

1SELECT customer_id, SUM(amount) AS total_spent
2FROM orders
3GROUP BY customer_id;

Adding ORDER BY and LIMIT turns it into a top-spenders query — something you would see on every business dashboard:

sql

QUBITS OF DPK

1SELECT customer_id, SUM(amount) AS total_spent
2FROM orders
3GROUP BY customer_id
4ORDER BY total_spent DESC
5LIMIT 10;

13. Interview Questions

What is the difference between COUNT(*) and COUNT(column)? COUNT(*) counts every row including those with NULLs. COUNT(column) counts only rows where that specific column is not NULL. This matters whenever your data has missing values.

Why can't you select a non-grouped column without an aggregate? Because a group contains multiple rows, and the database has no rule for which single value to return for that column. You must either include it in GROUP BY (so each unique value forms its own group) or wrap it in an aggregate function that reduces multiple values to one.

What is the difference between WHERE and HAVING? WHERE filters individual rows before any grouping happens. HAVING filters groups after aggregation is complete. You use HAVING whenever your filter condition involves an aggregate value like AVG(salary) or COUNT(*).

14. Quick Revision

sql

QUBITS OF DPK

1COUNT(*)        → count all rows
2COUNT(column)   → count non-NULL values only
3SUM()           → total
4AVG()           → average (ignores NULLs)
5MIN() / MAX()   → boundary values
6
7GROUP BY        → split rows into groups, aggregate each group
8HAVING          → filter groups after aggregation
9WHERE           → filter rows before aggregation
10
11Rule: every SELECT column must be in GROUP BY or inside an aggregate