GROUP BY: WHERE YOUR DATA BELONGS

By now, you might be wondering how you would go about selecting data based on the aggregated values. For instance, how do you find all publishers whose average book price is above $20? Surely, you could use the trusty WHERE clause, right? Uh, no. Try executing a query like this:

SELECT bk_publisher 
FROM books 
    WHERE AVG(bk_price) > 20;

You get an error message that in the context of Microsoft SQL Server 2008 reads as follows:

An aggregate cannot appear in the WHERE clause unless it is in subquery contained
in a HAVING clause or a select list, and the column being aggregated in an outer
reference.

Yet there is a way to ask the database engine such a question, and get a result, by using the GROUP BY clause (with the HAVING clause not far behind), but let's get the basics nailed down first.

Aggregating results by rolling them up to a single value is useful, as is having all the data stored in the individual records, but sometimes a middle ground is needed. Using the WHERE clause with an aggregate function allows us to aggregate a subset of the results. For example, we can count books published by a particular publisher. To do the same for all publishers, we would have to run as many queries as there are distinct publishers in our table; moreover, we would need to know the publishers by name to ask for them, and we may not know the names of all the publishers. If only we could ask the database for aggregated information grouped by some criterion ...

Get Discovering SQL: A Hands-On Guide for Beginners now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.