Chapter 4. Querying Basics

A query is a nickname for a SELECT statement, which consists of six main clauses. Each section of this chapter covers a clause in detail:

  1. SELECT

  2. FROM

  3. WHERE

  4. GROUP BY

  5. HAVING

  6. ORDER BY

The last section of this chapter covers the LIMIT clause, which is supported by MySQL, PostgreSQL, and SQLite.

The code examples in this chapter reference four tables:

waterfall

waterfalls in Michigan’s Upper Peninsula

owner

owners of the waterfalls

county

counties where the waterfalls are located

tour

tours that consist of multiple waterfall stops

Here is a sample query that uses the six main clauses. It is followed by the query results, which are also known as the result set.

-- Tours with 2 or more public waterfalls
SELECT   t.name AS tour_name,
         COUNT(*) AS num_waterfalls
FROM     tour t LEFT JOIN waterfall w
         ON t.stop = w.id
WHERE    w.open_to_public = 'y'
GROUP BY t.name
HAVING   COUNT(*) >= 2
ORDER BY tour_name;

tour_name  num_waterfalls
---------- ---------------
M-28                     6
Munising                 6
US-2                     4

To query a database means to retrieve data from a database, typically from a table or multiple tables.

Note

It is also possible to query a view instead of a table. Views look like tables and are derived from tables, but they themselves do not hold any data. More on views can be found in “Views” in Chapter 5.

The SELECT Clause

The SELECT clause specifies the columns that you want a statement to return.

In the SELECT clause, the SELECT keyword is followed by a list of column names and/or expressions that are separated by commas. Each column name and/or expression then becomes a column in the results.

Selecting Columns

The simplest SELECT clause lists one or more column names from the tables in the FROM clause:

SELECT id, name
FROM owner;

id    name
----- ----------------
    1 Pictured Rocks
    2 Michigan Nature
    3 AF LLC
    4 MI DNR
    5 Horseshoe Falls

Selecting All Columns

To return all columns from a table, you can use a single asterisk rather than write out each column name:

SELECT *
FROM owner;

id    name             phone         type
----- ---------------- ------------- --------
    1 Pictured Rocks   906.387.2607  public
    2 Michigan Nature  517.655.5655  private
    3 AF LLC                         private
    4 MI DNR           906.228.6561  public
    5 Horseshoe Falls  906.387.2635  private
Warning

The asterisk is a helpful shortcut when testing out queries because it can save you quite a bit of typing. However, it is risky to use the asterisk in production code because the columns in a table may change over time, causing your code to fail when there are fewer or more columns than expected.

Selecting Expressions

In addition to simply listing columns, you can also list more complex expressions within the SELECT clause to return as columns in the results.

The following statement includes an expression to calculate a 10% drop in population, rounded to zero decimal places:

SELECT name, ROUND(population * 0.9, 0)
FROM county;

name       ROUND(population * 0.9, 0)
---------- ---------------------------
Alger                             8876
Baraga                            7871
Ontonagon                         7036
...

Selecting Functions

Expressions in the SELECT list typically refer to columns in the tables that you are pulling from, but there are exceptions. For example, a common function that doesn’t refer to any tables is the one to return the current date:

SELECT CURRENT_DATE;

CURRENT_DATE
-------------
2021-12-01

The preceding code works in MySQL, PostgreSQL, and SQLite. Equivalent code that works in other RDBMSs can be found in “Datetime Functions” in Chapter 7.

Note

The majority of queries include both a SELECT and a FROM clause, but only the SELECT clause is required when using particular database functions, such as CURRENT_DATE.

It is also possible to include expressions within the SELECT clause that are subqueries (a query nested inside another query). More details can be found in “Selecting Subqueries”.

Aliasing Columns

The purpose of a column alias is to give a temporary name to any column or expression listed in the SELECT clause. That temporary name, or column alias, is then displayed as a column name in the results.

Note that this is not a permanent name change because the column names in the original tables remain unchanged. The alias only exists within the query.

This code displays three columns.

SELECT id, name,
       ROUND(population * 0.9, 0)
FROM county;

id    name       ROUND(population * 0.9, 0)
----- ---------- ---------------------------
    2 Alger                             8876
    6 Baraga                            7871
    7 Ontonagon                         7036
...

Let’s say we want to rename the column names in the results. id is too ambigious and we’d like to give it a more descriptive name. ROUND(population * 0.9, 0) is too long and we’d like to give it a simpler name.

To create a column alias, you follow a column name or expression with either (1) an alias name or (2) the AS keyword and an alias name.

-- alias_name
SELECT id county_id, name,
       ROUND(population * 0.9, 0) estimated_pop
FROM county;

or:

-- AS alias_name
SELECT id AS county_id, name,
       ROUND(population * 0.90, 0) AS estimated_pop
FROM county;

county_id  name       estimated_pop
---------- ---------- --------------
         2 Alger                8876
         6 Baraga               7871
         7 Ontonagon            7036
...

Both options are used in practice when creating aliases. Within the SELECT clause, the second option is more popular because the AS keyword makes it visually easier to differentiate column names and aliases among a long list of column names.

Note

Older versions of PostgreSQL require the use of AS when creating a column alias.

Although column aliases are not required, they are highly recommended when working with expressions to give sensible names to the columns in the results.

Aliases with case sensitivity and punctuation

As can be seen with the column aliases county_id and estimated_pop, the convention is to use lowercase letters with underscores in place of spaces when naming column aliases.

You can also create aliases containing uppercase letters, spaces, and punctuation using the double quote syntax, as shown in this example:

SELECT id AS "Waterfall #",
  name AS "Waterfall Name"
FROM waterfall;

Waterfall #  Waterfall Name
------------ ---------------
           1 Munising Falls
           2 Tannery Falls
           3 Alger Falls
...

Qualifying Columns

Let’s say you write a query that pulls data from two tables and they both contain a column called name. If you were to just include name in the SELECT clause, the code wouldn’t know which table you were referring to.

To solve this problem, you can qualify a column name by its table name. In other words, you can give a column a prefix to specify which table it belongs to using dot notation, as in table_name.column_name.

The following example queries a single table, so while it isn’t necessary to qualify the columns here, this is shown for demonstration’s sake. This is how you would qualify a column by its table name:

SELECT owner.id, owner.name
FROM owner;
Tip

If you get an error in SQL referencing an ambiguous column name, it means that multiple tables in your query have a column of the same name and you haven’t specified which table/column combination you are referring to. You can resolve the error by qualifying the column name.

Qualifying tables

If you qualify a column name by its table name, you can also qualify that table name by its database or schema name. The following query retrieves data specifically from the owner table within the sqlbook schema:

SELECT sqlbook.owner.id, sqlbook.owner.name
FROM sqlbook.owner;

The preceding code is lengthy since sqlbook.owner is repeated multiple times. To save on typing, you can provide a table alias. The following example gives the alias o to the table owner:

SELECT o.id, o.name
FROM sqlbook.owner o;

or:

SELECT o.id, o.name
FROM owner o;

Selecting Subqueries

A subquery is a query that is nested inside another query. Subqueries can be located within various clauses, including the SELECT clause.

In the following example, in addition to the id, name, and population, let’s say we also want to see the average population of all the counties. By including a subquery, we are creating a new column in the results for the average population.

SELECT id, name, population,
       (SELECT AVG(population) FROM county)
       AS average_pop
FROM county;

id    name       population  average_pop
----- ---------- ----------- ------------
    2 Alger             9862        18298
    6 Baraga            8746        18298
    7 Ontonagon         7818        18298
...

A few things to note here:

  • A subquery must be surrounded by parentheses.

  • When writing a subquery within the SELECT clause, it is highly recommended that you specify a column alias, which in this case was average_pop. That way, the column has a simple name in the results.

  • There is only one value in the average_pop column that is repeated across all rows. When including a subquery within the SELECT clause, the result of the subquery must return a single column and either zero or one row, as shown in the following subquery to calculate the average population.

SELECT AVG(population) FROM county;

AVG(population)
----------------
           18298
  • If the subquery returned zero rows, then the new column would be filled with NULL values.

Performance issues with correlated subqueries

The following query returns the number of waterfalls for each owner. Note the o.id = w.owner_id step in the subquery references the owner table in the outer query, making it a correlated subquery.

SELECT o.id, o.name,
       (SELECT COUNT(*) FROM waterfall w
       WHERE o.id = w.owner_id) AS num_waterfalls
FROM owner o;

id    name             num_waterfalls
----- ---------------- ---------------
    1 Pictured Rocks                 3
    2 Michigan Nature                3
    3 AF LLC                         1
    4 MI DNR                         1
    5 Horseshoe Falls                0

A better approach would be to rewrite the query with a JOIN. That way, the tables are first joined together and then the rest of the query is run, which is much faster than rerunning a subquery for each row of data. More on joins can be found in “Joining Tables” in Chapter 9.

SELECT   o.id, o.name,
         COUNT(w.id) AS num_waterfalls
FROM     owner o LEFT JOIN waterfalls w
         ON o.id = w.owner_id
GROUP BY o.id, o.name

id    name             num_waterfalls
----- ---------------- ---------------
    1 Pictured Rocks                 3
    2 Michigan Nature                3
    3 AF LLC                         1
    4 MI DNR                         1
    5 Horseshoe Falls                0

DISTINCT

When a column is listed in the SELECT clause, by default, all of the rows are returned. To be more explicit, you can include the ALL keyword, but it is purely optional. The following queries return each type/open_to_public combination.

SELECT o.type, w.open_to_public
FROM owner o
JOIN waterfall w ON o.id = w.owner_id;

or:

SELECT ALL o.type, w.open_to_public
FROM owner o
JOIN waterfall w ON o.id = w.owner_id;

type     open_to_public
-------- ---------------
public   y
public   y
public   y
private  y
private  y
private  y
private  y
public   y

If you want to remove duplicate rows from the results, you can use the DISTINCT keyword. The following query returns a list of unique type/open_to_public combinations.

SELECT DISTINCT o.type, w.open_to_public
FROM owner o
JOIN waterfall w ON o.id = w.owner_id;

type     open_to_public
-------- ---------------
public   y
private  y

COUNT and DISTINCT

To count the number of unique values within a single column, you can combine the COUNT and DISTINCT keywords within the SELECT clause. The following query returns the number of unique type values.

SELECT COUNT(DISTINCT type) AS unique
FROM owner;

unique
-------
      2

To count the number of unique combinations of multiple columns, you can wrap a DISTINCT query up as a subquery, and then do a COUNT on the subquery. The following query returns the number of unique type/open_to_public combinations.

SELECT COUNT(*) AS num_unique
FROM (SELECT DISTINCT o.type, w.open_to_public
      FROM owner o JOIN waterfall w
      ON o.id = w.owner_id) my_subquery;

num_unique
-----------
         2

MySQL and PostgreSQL support the use of the COUNT(DISTINCT) syntax on multiple columns. The following two queries are equivalent to the preceding query, without needing a subquery:

-- MySQL equivalent
SELECT COUNT(DISTINCT o.type, w.open_to_public)
       AS num_unique
       FROM owner o JOIN waterfall w
            ON o.id = w.owner_id;

-- PostgreSQL equivalent
SELECT COUNT(DISTINCT (o.type, w.open_to_public))
       AS num_unique
       FROM owner o JOIN waterfall w
            ON o.id = w.owner_id;

num_unique
-----------
          2

The FROM Clause

The FROM clause is used to specify the source of the data you want to retrieve. The simplest case is to name a single table or view in the FROM clause of query.

SELECT name
FROM waterfall;

You can qualify a table or view with either a database or schema name using the dot notation. The following query retrieves data specifically from the waterfall table within the sqlbook schema:

SELECT name
FROM sqlbook.waterfall;

From Multiple Tables

Instead of retrieving data from one table, you’ll often want to pull together data from multiple tables. The most common way to do this is using a JOIN clause within the FROM clause. The following query retrieves data from both the waterfall and tour tables and displays a single results table.

SELECT *
FROM waterfall w JOIN tour t
     ON w.id = t.stop;

id    name            ... name      stop  ...
----- ---------------     --------- -----
    1 Munising Falls      M-28          1
    1 Munising Falls      Munising      1
    2 Tannery Falls       Munising      2
    3 Alger Falls         M-28          3
    3 Alger Falls         Munising      3
...

Let’s break down each part of the code block.

Table aliases

waterfall w JOIN tour t

The waterfall and tour tables are given table aliases w and t, which are temporary names for the tables within the query. Table aliases are not required in a JOIN clause, but they are very helpful for shortening table names that need to be referenced within the ON and SELECT clauses.

JOIN … ON …

waterfall w JOIN tour t
ON w.id = t.stop

These two tables are pulled together with the JOIN keyword. A JOIN clause is always followed by an ON clause, which specifies how the tables should be linked together. In this case, the id of the waterfall in the waterfall table must match the stop with the waterfall in the tour table.

Note

You may see the FROM, JOIN, and ON clauses on different lines or indented. This is not required, but helpful for readability’s sake, especially when you are joining many tables together.

Results table

A query always results in a single table. The waterfall table has 12 columns and the tour table has 3 columns. After joining these tables together, the results table has 15 columns.

id    name            ... name      stop  ...
----- ---------------     --------- -----
    1 Munising Falls      M-28          1
    1 Munising Falls      Munising      1
    2 Tannery Falls       Munising      2
    3 Alger Falls         M-28          3
    3 Alger Falls         Munising      3
...

You’ll notice that there are two columns called name in the results table. The first is from the waterfall table, and the second is from the tour table. To refer to them in the SELECT clause, you would need to qualify the column names.

SELECT w.name, t.name
FROM waterfall w JOIN tour t
     ON w.id = t.stop;

name            name
--------------- ---------
Munising Falls  M-28
Munising Falls  Munising
Tannery Falls   Munising
...

To differentiate the two columns, you would also want to alias the column names.

SELECT w.name AS waterfall_name,
       t.name AS tour_name
FROM waterfall w JOIN tour t
     ON w.id = t.stop;

waterfall_name  tour_name
--------------- ----------
Munising Falls  M-28
Munising Falls  Munising
Tannery Falls   Munising
Alger Falls     M-28
Alger Falls     Munising
...

JOIN variations

In the preceding example, if a waterfall isn’t listed in any tour, then it would not appear in the results table. If you wanted to see all waterfalls in the results, you would need to use a different type of join.

There are a variety of join types used in SQL, which are covered in more detail in “Joining Tables” in Chapter 9.

From Subqueries

A subquery is a query that is nested inside another query. Subqueries within the FROM clause should be standalone SELECT statements, meaning that they do not reference the outer query at all and can be run on their own.

Note

A subquery within the FROM clause is also known as a derived table because the subquery ends up essentially acting like a table for the duration of the query.

The following query lists all publicly owned waterfalls, with the subquery portion bolded.

SELECT w.name AS waterfall_name,
       o.name AS owner_name
FROM (SELECT * FROM owner WHERE type = 'public') o
     JOIN waterfall w
     ON o.id = w.owner_id;

waterfall_name  owner_name
--------------- ---------------
Little Miners   Pictured Rocks
Miners Falls    Pictured Rocks
Munising Falls  Pictured Rocks
Wagner Falls    MI DNR

It is important to understand the order in which the query is executed.

Step 1: Execute the subquery

The contents of the subquery are first executed. You can see that this results in a table of only public owners:

SELECT * FROM owner WHERE type = 'public';

id    name            phone         type
----- --------------- ------------- -------
    1 Pictured Rocks  906.387.2607  public
    4 MI DNR          906.228.6561  public

Going back to the original query, you’ll notice that the subquery is immediately followed by the letter o. This is the temporary name, or alias, that we are assigning to the results of the subquery.

Note

Aliases are required for subqueries within the FROM clause in MySQL, PostgreSQL, and SQL Server, but not in Oracle and SQLite.

Step 2: Execute the entire query

Next, you can think of the letter o taking the place of the subquery. The query is now executed as usual.

SELECT w.name AS waterfall_name,
       o.name AS owner_name
FROM o JOIN waterfall w
     ON o.id = w.owner_id;

waterfall_name  owner_name
--------------- ---------------
Little Miners   Pictured Rocks
Miners Falls    Pictured Rocks
Munising Falls  Pictured Rocks
Wagner Falls    MI DNR

Why Use a Subquery in the FROM Clause?

The main advantage of using subqueries is that you can turn a large problem into smaller ones. Here are two examples:

Example 1: Multiple steps to get to results

Let’s say you want to find the average number of stops on a tour. First, you’d have to find the number of stops on each tour, and then average the results.

The following query finds the number of stops on each tour:

SELECT name, MAX(stop) as num_stops
FROM tour
GROUP BY name;

name      num_stops
--------- ----------
M-28              11
Munising           6
US-2              14

You could then turn the query into a subquery and write another query around it to find the average:

SELECT AVG(num_stops) FROM
(SELECT name, MAX(stop) as num_stops
FROM tour
GROUP BY name) tour_stops;

AVG(num_stops)
-----------------
10.3333333333333
Example 2: Table in FROM clause is too large

The original goal was to list all publicly owned waterfalls. This can actually be done without a subquery and with a JOIN instead:

SELECT w.name AS waterfall_name,
       o.name AS owner_name
FROM   owner o
       JOIN waterfall w ON o.id = w.owner_id
WHERE  o.type = 'public';

waterfall_name  owner_name
--------------- ---------------
Little Miners   Pictured Rocks
Miners Falls    Pictured Rocks
Munising Falls  Pictured Rocks
Wagner Falls    MI DNR

Let’s say that the query takes a really long time to run. This can happen when you join massive tables together (think tens of millions of rows). There are multiple ways you could rewrite the query to speed it up, and one of them is to use a subquery.

Since we are only interested in public owners, we can first write a subquery that filters out all of the private owners. The smaller owner table would then be joined with the waterfall table, which would take less time and produce the same results.

SELECT w.name AS waterfall_name,
       o.name AS owner_name
FROM   (SELECT * FROM owner
       WHERE type = 'public') o
       JOIN waterfall w ON o.id = w.owner_id;

waterfall_name  owner_name
--------------- ---------------
Little Miners   Pictured Rocks
Miners Falls    Pictured Rocks
Munising Falls  Pictured Rocks
Wagner Falls    MI DNR

These are just two of the many examples of how subqueries can be used to break down a larger query into smaller steps.

The WHERE Clause

The WHERE clause is used to restrict query results to only rows of interest, or simply put, it is the place to filter data. Rarely will you want to display all rows from a table, but rather rows that match specific criteria.

Tip

When exploring a table with millions of rows, you never want to do a SELECT * FROM my_table; because it will take an unnecessarily long time to run.

Instead, it’s a good idea to filter down the data. Two common ways to do this are:

Filter by a column within the WHERE clause

Better yet, filter by a column that is already indexed to make the retrieval even faster.

SELECT *
FROM my_table
WHERE year_id = 2021;
Show the top few rows of data with the LIMIT clause

(or WHERE ROWNUM <= 10 in Oracle or SELECT TOP 10 * in SQL Server)

SELECT *
FROM my_table
LIMIT 10;

The following query finds all waterfalls that do not contain Falls in the name. More on the LIKE keyword can be found in Chapter 7.

SELECT id, name
FROM waterfall
WHERE name NOT LIKE '%Falls%';

id    name
----- ----------------
    7 Little Miners
   14 Rapid River Fls

The bolded section is often referred to as a conditional statement or a predicate. The predicate makes a logical comparison for each row of data that results in TRUE/FALSE/UNKNOWN.

The waterfall table has 16 rows. For each row, it checks if the waterfall name contains Falls or not. If it doesn’t contain Falls, then the name NOT LIKE '%Falls%' predicate is TRUE, and the row is returned in the results, which was the case for the two preceding rows.

Multiple Predicates

It is also possible to combine multiple predicates with operators like AND or OR. The following example shows waterfalls without Falls in its name and that also don’t have an owner:

SELECT id, name
FROM waterfall
WHERE name NOT LIKE '%Falls%'
      AND owner_id IS NULL;

id    name
----- ----------------
   14 Rapid River Fls

More details on operators can be found in Operators in Chapter 7.

Filtering on Subqueries

A subquery is a query nested inside another query, and the WHERE clause is a common place to find one. The following example retrieves publicly accessible waterfalls located in Alger County:

SELECT w.name
FROM   waterfall w
WHERE  w.open_to_public = 'y'
       AND w.county_id IN (
           SELECT c.id FROM county c
           WHERE c.name = 'Alger');

name
---------------
Munising Falls
Tannery Falls
Alger Falls
...
Note

Unlike subqueries within the SELECT clause or the FROM clause, subqueries in the WHERE clause do not require an alias. In fact, you will get an error if you include an alias.

Why use a subquery in the WHERE clause?

The original goal was to retrieve publicly accessible waterfalls located in Alger County. If you were to write this query from scratch, you would likely start with the following:

SELECT w.name
FROM   waterfall w
WHERE  w.open_to_public = 'y';

At this point, you have all waterfalls that are publicly accessible. The final touch is to find ones that are specifically in Alger County. You know that the waterfall table doesn’t have a county name column, but the county table does.

You have two options to pull the county name into the results. You can either (1) write a subquery within the WHERE clause that specifically pulls the Alger County information or (2) join together the waterfall and county tables:

-- Subquery in WHERE clause
SELECT w.name
FROM   waterfall w
WHERE  w.open_to_public = 'y'
       AND w.county_id IN (
           SELECT c.id FROM county c
           WHERE c.name = 'Alger');

or:

-- JOIN clause
SELECT w.name
FROM   waterfall w INNER JOIN county c
       ON w.county_id = c.id
WHERE  w.open_to_public = 'y'
       AND c.name = 'Alger';

name
---------------
Munising Falls
Tannery Falls
Alger Falls
...

The two queries produce the same results. The advantage of the first approach is that subqueries are often easier to understand than joins. The advantage of the second approach is that joins typically execute faster than subqueries.

Other Ways to Filter Data

The WHERE clause is not the only place within a SELECT statement to filter rows of data.

  • FROM clause: When joining together tables, the ON clause specifies how they should be linked together. This is where you can include conditions to restrict rows of data returned by the query. See Joining Tables in Chapter 9 for more details.

  • HAVING clause: If there are aggregations within the SELECT statement, the HAVING clause is where you specify how the aggregations should be filtered. See “The HAVING Clause” for more details.

  • LIMIT clause: To display a specific number of rows, you can use the LIMIT clause. In Oracle, this is done with WHERE ROWNUM and in SQL Server, this is done with SELECT TOP. See “The LIMIT Clause” in this chapter for more details.

The GROUP BY Clause

The purpose of the GROUP BY clause is to collect rows into groups and summarize the rows within the groups in some way, ultimately returning just one row per group. This is sometimes referred to as “slicing” the rows into groups and “rolling up” the rows in each group.

The following query counts the number of waterfalls along each of the tours:

SELECT   t.name AS tour_name,
         COUNT(*) AS num_waterfalls
FROM     waterfall w INNER JOIN tour t
         ON w.id = t.stop
GROUP BY t.name;

tour_name  num_waterfalls
---------- ---------------
M-28                     6
Munising                 6
US-2                     4

There are two parts to focus on here:

  • The collecting of rows, which is specified within the GROUP BY clause

  • The summarizing of rows within groups, which is specified within the SELECT clause

Step 1: The collecting of rows

In the GROUP BY clause:

GROUP BY t.name

we state that we would like to look at all of the rows of data and put the M-28 tour waterfalls into a group, all of the Munising tour waterfalls into a group, and so on. Behind the scenes, the data is being grouped like this:

tour_name  waterfall_name
---------- ----------------
M-28       Munising Falls
M-28       Alger Falls
M-28       Scott Falls
M-28       Canyon Falls
M-28       Agate Falls
M-28       Bond Falls

Munising   Munising Falls
Munising   Tannery Falls
Munising   Alger Falls
Munising   Wagner Falls
Munising   Horseshoe Falls
Munising   Miners Falls



US-2       Bond Falls
US-2       Fumee Falls
US-2       Kakabika Falls
US-2       Rapid River Fls

Step 2: The summarizing of rows

In the SELECT clause,

SELECT t.name AS tour_name,
       COUNT(*) AS num_waterfalls

we state that for each group, or each tour, we want to count the number of rows of data in the group. Because each row represents a waterfall, this would result in the total number of waterfalls along each tour.

The COUNT() function here is more formally known as an aggregate function, or a function that summarizes many rows of data into a single value. More aggregate functions can be found in “Aggregate Functions” in Chapter 7.

Warning

In this example, COUNT(*) returns the number of waterfalls along each tour. However, this is only because each row of data in the waterfall and tour tables represent a single waterfall.

If a single waterfall was listed on multiple rows, COUNT(*) would result in a larger value than expected. In this case, you could potentially use COUNT(DISTINCT waterfall_name) instead to find the unique waterfalls. More details can be found in COUNT and DISTINCT.

The key takeaway is that it is important to manually double-check the results of the aggregate function to make sure it is summarizing the data in the way that you intended.

Now that the groups have been created with the GROUP BY clause, the aggregate function will be applied once to each group:

tour_name  COUNT(*)
---------- ---------
M-28               6
M-28
M-28
M-28
M-28
M-28

Munising           6
Munising
Munising
Munising
Munising
Munising

US-2               4
US-2
US-2
US-2

Any columns to which an aggregate function has not been applied, which in this case is the tour_name column, are now collapsed into one value:

tour_name  COUNT(*)
---------- ---------
M-28               6
Munising           6
US-2               4
Note

This collapsing of many detail rows into one aggregate row means that when using a GROUP BY clause, the SELECT clause should only contain:

  • All columns listed in the GROUP BY clause: t.name

  • Aggregations: COUNT(*)

SELECT t.name AS tour_name,
       COUNT(*) AS num_waterfalls
...
GROUP BY t.name;

Not doing so could either result in an error message or return inaccurate values.

For more complex grouping situations including ROLLUP, CUBE, and GROUPING SETS, go to “Grouping and Summarizing” in Chapter 8.

The HAVING Clause

The HAVING clause places restrictions on the rows returned from a GROUP BY query. In other words, it allows you to filter on the results after a GROUP BY has been applied.

Note

A HAVING clause always immediately follows a GROUP BY clause. Without a GROUP BY clause, there can be no HAVING clause.

This is a query that lists the number of waterfalls on each tour using a GROUP BY clause:

SELECT   t.name AS tour_name,
         COUNT(*) AS num_waterfalls
FROM     waterfall w INNER JOIN tour t
         ON w.id = t.stop
GROUP BY t.name;

tour_name num_waterfalls ---------- --------------- M-28 6 Munising 6 US-2 4

Let’s say we only want to list the tours that have exactly six stops. To do so, you would add a HAVING clause after the GROUP BY clause:

SELECT   t.name AS tour_name,
         COUNT(*) AS num_waterfalls
FROM     waterfall w INNER JOIN tour t
         ON w.id = t.stop
GROUP BY t.name
HAVING   COUNT(*) = 6;

tour_name  num_waterfalls
---------- ---------------
M-28                     6
Munising                 6

You’ll notice that the HAVING clause refers to the aggregation COUNT(*),

SELECT COUNT(*) AS num_waterfalls
...
HAVING COUNT(*) = 6;

and not the alias,

# code will not run
SELECT COUNT(*) AS num_waterfalls
...
HAVING num_waterfalls = 6;

The reason for this is because of the order of execution of the clauses. The SELECT clause is written before the HAVING clause. However, the SELECT clause is actually executed after the HAVING clause.

That means that the alias num_waterfalls in the SELECT clause does not exist at the time the HAVING clause is being executed. The HAVING clause must refer to the raw aggregation COUNT(*) instead.

Note

MySQL and SQLite are exceptions, and allow aliases (num_waterfalls) in the HAVING clause.

The ORDER BY Clause

The ORDER BY clause is used to specify how you want the results of a query to be sorted.

The following query returns a list of owners and waterfalls, without any sorting:

SELECT COALESCE(o.name, 'Unknown') AS owner,
       w.name AS waterfall_name
FROM   waterfall w
       LEFT JOIN owner o ON w.owner_id = o.id;

owner            waterfall_name
---------------- ---------------
Pictured Rocks   Munising Falls
Michigan Nature  Tannery Falls
AF LLC           Alger Falls
MI DNR           Wagner Falls
Unknown          Horseshoe Falls
...

The following query returns the same list, but first sorted alphabetically by owner, and then by waterfall:

SELECT   COALESCE(o.name, 'Unknown') AS owner,
         w.name AS waterfall_name
FROM     waterfall w
         LEFT JOIN owner o ON w.owner_id = o.id
ORDER BY owner, waterfall_name;

owner            waterfall_name
---------------- ---------------
AF LLC           Alger Falls
MI DNR           Wagner Falls
Michigan Nature  Tannery Falls
Michigan Nature  Twin Falls #1
Michigan Nature  Twin Falls #2
...

The default sort is in ascending order, meaning text will go from A to Z and numbers will go from lowest to highest. You can use the keywords ASCENDING and DESCENDING (which can be abbreviated as ASC and DESC) to control the sort on each column.

The following is a modification of the previous sort, but this time, it sorts owner names in reverse order:

SELECT COALESCE(o.name, 'Unknown') AS owner,
       w.name AS waterfall_name
...
ORDER BY owner DESC, waterfall_name ASC;

owner            waterfall_name
---------------- ---------------
Unknown          Agate Falls
Unknown          Bond Falls
Unknown          Canyon Falls
...

You can sort by columns and expressions that are not in your SELECT list:

SELECT   COALESCE(o.name, 'Unknown') AS owner,
         w.name AS waterfall_name
FROM     waterfall w
         LEFT JOIN owner o ON w.owner_id = o.id
ORDER BY o.id DESC, w.id;

owner            waterfall_name
---------------- ---------------
MI DNR           Wagner Falls
AF LLC           Alger Falls
Michigan Nature  Tannery Falls
...

You can also sort by numeric column position:

SELECT COALESCE(o.name, 'Unknown') AS owner,
       w.name AS waterfall_name
...
ORDER BY 1 DESC, 2 ASC;

owner            waterfall_name
---------------- ---------------
Unknown          Agate Falls
Unknown          Bond Falls
Unknown          Canyon Falls
...

Because the rows of a SQL table are unordered, if you don’t include an ORDER BY clause in a query, each time you execute the query, the results could be displayed in a different order.

The LIMIT Clause

When quickly viewing a table, it is best practice to return a limited number of rows instead of the entire table.

MySQL, PostgreSQL, and SQLite support the LIMIT clause. Oracle and SQL Server use different syntax with the same functionality:

-- MySQL, PostgreSQL, and SQLite
SELECT *
FROM owner
LIMIT 3;

-- Oracle
SELECT *
FROM owner
WHERE ROWNUM <= 3;

-- SQL Server
SELECT TOP 3 *
FROM owner;

id  name             phone         type
--- ---------------- ------------- --------
  1 Pictured Rocks   906.387.2607  public
  2 Michigan Nature  517.655.5655  private
  3 AF LLC                         private

Another way to limit the number of rows returned is to filter on a column within the WHERE clause. The filtering will execute even faster if the column is indexed.

Get SQL Pocket Guide, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.