SQL 5 : Aggregate Queries
D
Qubits of DPK
April 11, 2026
Core DBMS
1. Recap of SQL 4
Previously we learned Joins, which combine data from multiple tables using a shared key. Now we shift focus from combining data to analyzing it — specifically, how to summarize large sets of rows into meaningful insights using Aggregate Functions.
2. What Are Aggregate Functions?
An aggregate function takes multiple rows as input and collapses them into a single result value. Think of it as a compression — you feed in many rows, and the function distills them into one number that tells you something meaningful about the group.
sql
QUBITS OF DPK
This is the core idea behind all analytics queries. Rather than reading every row individually, you ask the database to summarize them for you.
3. The Five Core Aggregate Functions
4. COUNT()
COUNT(*) counts every row in the table, including those with NULL values in some columns.
sql
QUBITS OF DPK
COUNT(column) is subtly different and important to understand — it counts only non-NULL values in that specific column. If a column has NULLs, those rows are silently skipped.
sql
QUBITS OF DPK
This distinction matters in interviews. Always ask yourself: do you want to count rows, or count populated values in a specific column?
5. SUM() and AVG()
SUM() adds all values in a column together. AVG() divides that total by the count of non-NULL rows.
sql
QUBITS OF DPK
One thing to watch: AVG() ignores NULL values in its calculation, which can sometimes produce a higher average than you expect if many rows are NULL.
6. MIN() and MAX()
These find the boundary values in a column — the smallest and largest.
sql
QUBITS OF DPK
These work on numbers, dates, and strings alike. On strings, MIN and MAX follow alphabetical ordering.
7. GROUP BY
Aggregate functions become truly powerful when combined with GROUP BY. Instead of summarizing the entire table into one row, GROUP BY lets you summarize per category — one result row per group.
sql
QUBITS OF DPK
Think of it this way: GROUP BY department splits the table into separate buckets — one for Engineering, one for Marketing — and then runs the aggregate function independently inside each bucket.
The golden rule of GROUP BY — every column in your SELECT clause must either appear in the GROUP BY clause, or be wrapped inside an aggregate function. If you try to select salary alongside GROUP BY department, the database has no idea which salary to show for the group (there are many). It will either throw an error or return unpredictable results.
sql
QUBITS OF DPK
8. HAVING
HAVING is to groups what WHERE is to rows. It lets you filter after grouping — keeping only the groups that meet a certain condition.
sql
QUBITS OF DPK
You cannot use WHERE AVG(salary) > 65000 here because WHERE runs before grouping — the aggregate values don't exist yet at that point. HAVING runs after grouping, when those computed values are available.
You can use both in the same query. WHERE narrows down the rows first, then GROUP BY groups what remains, then HAVING filters those groups.
sql
QUBITS OF DPK
9. Cross Join
A cross join returns every possible combination of rows from two tables. If Table A has 2 rows and Table B has 2 rows, the result has 4 rows.
sql
QUBITS OF DPK
Cross joins are rarely used in day-to-day queries, but they appear in scenarios like generating all possible combinations of product variants (size × color × material).
10. Implicit Join (Older Syntax)
Before the JOIN keyword was standardized, tables were joined by listing them in FROM and filtering with WHERE. You may see this in legacy codebases.
sql
QUBITS OF DPK
Always use the explicit JOIN syntax in production and interviews. It is clearer, easier to read, and makes the intent of the join obvious.
11. ON vs WHERE in Joins
These two clauses serve different purposes and should not be confused.
sql
QUBITS OF DPK
ON defines the relationship between the two tables. WHERE filters the result of the join. Mixing them up — especially with LEFT JOIN — can produce subtly wrong results that are difficult to debug.
12. Real Production Example — E-commerce Analytics
Finding total spending per customer is one of the most common analytics queries in any e-commerce system:
sql
QUBITS OF DPK
Adding ORDER BY and LIMIT turns it into a top-spenders query — something you would see on every business dashboard:
sql
QUBITS OF DPK
13. Interview Questions
What is the difference between COUNT(*) and COUNT(column)? COUNT(*) counts every row including those with NULLs. COUNT(column) counts only rows where that specific column is not NULL. This matters whenever your data has missing values.
Why can't you select a non-grouped column without an aggregate? Because a group contains multiple rows, and the database has no rule for which single value to return for that column. You must either include it in GROUP BY (so each unique value forms its own group) or wrap it in an aggregate function that reduces multiple values to one.
What is the difference between WHERE and HAVING? WHERE filters individual rows before any grouping happens. HAVING filters groups after aggregation is complete. You use HAVING whenever your filter condition involves an aggregate value like AVG(salary) or COUNT(*).
14. Quick Revision
sql
QUBITS OF DPK