Chapter 4. Querying Basics
A query is a nickname for a SELECT
statement, which consists of six main clauses. Each section of this chapter covers a clause in detail:
The last section of this chapter covers the LIMIT clause, which is supported by MySQL, PostgreSQL, and SQLite.
The code examples in this chapter reference four tables:
waterfall
-
waterfalls in Michigan’s Upper Peninsula
owner
-
owners of the waterfalls
county
-
counties where the waterfalls are located
tour
-
tours that consist of multiple waterfall stops
Here is a sample query that uses the six main clauses. It is followed by the query results, which are also known as the result set.
-- Tours with 2 or more public waterfalls SELECT t.name AS tour_name, COUNT(*) AS num_waterfalls FROM tour t LEFT JOIN waterfall w ON t.stop = w.id WHERE w.open_to_public = 'y' GROUP BY t.name HAVING COUNT(*) >= 2 ORDER BY tour_name; tour_name num_waterfalls ---------- --------------- M-28 6 Munising 6 US-2 4
To query a database means to retrieve data from a database, typically from a table or multiple tables.
Note
It is also possible to query a view instead of a table. Views look like tables and are derived from tables, but they themselves do not hold any data. More on views can be found in “Views” in Chapter 5.
The SELECT Clause
The SELECT
clause specifies the columns that you want a statement to return.
In the SELECT
clause, the SELECT
keyword is followed by a list of column names and/or expressions that are separated by commas. Each column name and/or expression then becomes a column in the results.
Selecting Columns
The simplest SELECT
clause lists one or more column names from the tables in the FROM
clause:
SELECT id, name FROM owner; id name ----- ---------------- 1 Pictured Rocks 2 Michigan Nature 3 AF LLC 4 MI DNR 5 Horseshoe Falls
Selecting All Columns
To return all columns from a table, you can use a single asterisk rather than write out each column name:
SELECT * FROM owner; id name phone type ----- ---------------- ------------- -------- 1 Pictured Rocks 906.387.2607 public 2 Michigan Nature 517.655.5655 private 3 AF LLC private 4 MI DNR 906.228.6561 public 5 Horseshoe Falls 906.387.2635 private
Warning
The asterisk is a helpful shortcut when testing out queries because it can save you quite a bit of typing. However, it is risky to use the asterisk in production code because the columns in a table may change over time, causing your code to fail when there are fewer or more columns than expected.
Selecting Expressions
In addition to simply listing columns, you can also list more complex expressions within the SELECT
clause to return as columns in the results.
The following statement includes an expression to calculate a 10% drop in population, rounded to zero decimal places:
SELECT name, ROUND(population * 0.9, 0) FROM county; name ROUND(population * 0.9, 0) ---------- --------------------------- Alger 8876 Baraga 7871 Ontonagon 7036 ...
Selecting Functions
Expressions in the SELECT
list typically refer to columns in the tables that you are pulling from, but there are exceptions. For example, a common function that doesn’t refer to any tables is the one to return the current date:
SELECT CURRENT_DATE; CURRENT_DATE ------------- 2021-12-01
The preceding code works in MySQL, PostgreSQL, and SQLite. Equivalent code that works in other RDBMSs can be found in “Datetime Functions” in Chapter 7.
Note
The majority of queries include both a SELECT
and a FROM
clause, but only the SELECT
clause is required when using particular database functions, such as CURRENT_DATE
.
It is also possible to include expressions within the SELECT
clause that are subqueries (a query nested inside another query). More details can be found in “Selecting Subqueries”.
Aliasing Columns
The purpose of a column alias is to give a temporary name to any column or expression listed in the SELECT
clause. That temporary name, or column alias, is then displayed as a column name in the results.
Note that this is not a permanent name change because the column names in the original tables remain unchanged. The alias only exists within the query.
This code displays three columns.
SELECT id, name, ROUND(population * 0.9, 0) FROM county; id name ROUND(population * 0.9, 0) ----- ---------- --------------------------- 2 Alger 8876 6 Baraga 7871 7 Ontonagon 7036 ...
Let’s say we want to rename the column names in the results. id
is too ambigious and we’d like to give it a more descriptive name. ROUND(population * 0.9, 0)
is too long and we’d like to give it a simpler name.
To create a column alias, you follow a column name or expression with either (1) an alias name or (2) the AS
keyword and an alias name.
-- alias_name SELECT id county_id, name, ROUND(population * 0.9, 0) estimated_pop FROM county;
or:
-- AS alias_name SELECT id AS county_id, name, ROUND(population * 0.90, 0) AS estimated_pop FROM county; county_id name estimated_pop ---------- ---------- -------------- 2 Alger 8876 6 Baraga 7871 7 Ontonagon 7036 ...
Both options are used in practice when creating aliases. Within the SELECT
clause, the second option is more popular because the AS
keyword makes it visually easier to differentiate column names and aliases among a long list of column names.
Note
Older versions of PostgreSQL require the use of AS
when creating a column alias.
Although column aliases are not required, they are highly recommended when working with expressions to give sensible names to the columns in the results.
Aliases with case sensitivity and punctuation
As can be seen with the column aliases county_id
and
estimated_pop
, the convention is to use lowercase letters with underscores in place of spaces when naming column aliases.
You can also create aliases containing uppercase letters, spaces, and punctuation using the double quote syntax, as shown in this example:
SELECT id AS "Waterfall #", name AS "Waterfall Name" FROM waterfall; Waterfall # Waterfall Name ------------ --------------- 1 Munising Falls 2 Tannery Falls 3 Alger Falls ...
Qualifying Columns
Let’s say you write a query that pulls data from two tables and they both contain a column called name
. If you were to just include name
in the SELECT
clause, the code wouldn’t know which table you were referring to.
To solve this problem, you can qualify a column name by its table name. In other words, you can give a column a prefix to specify which table it belongs to using dot notation, as in table_name.column_name
.
The following example queries a single table, so while it isn’t necessary to qualify the columns here, this is shown for demonstration’s sake. This is how you would qualify a column by its table name:
SELECT owner.id, owner.name FROM owner;
Tip
If you get an error in SQL referencing an ambiguous column name, it means that multiple tables in your query have a column of the same name and you haven’t specified which table/column combination you are referring to. You can resolve the error by qualifying the column name.
Qualifying tables
If you qualify a column name by its table name, you can also qualify that table name by its database or schema name. The following query retrieves data specifically from the owner
table within the sqlbook
schema:
SELECT sqlbook.owner.id, sqlbook.owner.name FROM sqlbook.owner;
The preceding code is lengthy since sqlbook.owner
is repeated multiple times. To save on typing, you can provide a table alias. The following example gives the alias o
to the table owner
:
SELECT o.id, o.name FROM sqlbook.owner o;
or:
SELECT o.id, o.name FROM owner o;
Selecting Subqueries
A subquery is a query that is nested inside another query. Subqueries can be located within various clauses, including the SELECT
clause.
In the following example, in addition to the id
, name
, and population
, let’s say we also want to see the average population of all the counties. By including a subquery, we are creating a new column in the results for the average population.
SELECT id, name, population, (SELECT AVG(population) FROM county) AS average_pop FROM county; id name population average_pop ----- ---------- ----------- ------------ 2 Alger 9862 18298 6 Baraga 8746 18298 7 Ontonagon 7818 18298 ...
A few things to note here:
-
When writing a subquery within the
SELECT
clause, it is highly recommended that you specify a column alias, which in this case wasaverage_pop
. That way, the column has a simple name in the results. -
There is only one value in the
average_pop
column that is repeated across all rows. When including a subquery within theSELECT
clause, the result of the subquery must return a single column and either zero or one row, as shown in the following subquery to calculate the average population.
SELECT AVG(population) FROM county; AVG(population) ---------------- 18298
-
If the subquery returned zero rows, then the new column would be filled with
NULL
values.
DISTINCT
When a column is listed in the SELECT
clause, by default, all of the rows are returned. To be more explicit, you can include the ALL
keyword, but it is purely optional. The following queries return each type
/open_to_public
combination.
SELECT o.type, w.open_to_public FROM owner o JOIN waterfall w ON o.id = w.owner_id;
or:
SELECT ALL o.type, w.open_to_public FROM owner o JOIN waterfall w ON o.id = w.owner_id; type open_to_public -------- --------------- public y public y public y private y private y private y private y public y
If you want to remove duplicate rows from the results, you can use the DISTINCT
keyword. The following query returns a list of unique type
/open_to_public
combinations.
SELECT DISTINCT o.type, w.open_to_public FROM owner o JOIN waterfall w ON o.id = w.owner_id; type open_to_public -------- --------------- public y private y
COUNT and DISTINCT
To count the number of unique values within a single column, you can combine the COUNT
and DISTINCT
keywords within the SELECT
clause. The following query returns the number of unique type
values.
SELECT COUNT(DISTINCT type) AS unique FROM owner; unique ------- 2
To count the number of unique combinations of multiple columns, you can wrap a DISTINCT
query up as a subquery, and then do a COUNT
on the subquery. The following query returns the number of unique type
/open_to_public
combinations.
SELECT COUNT(*) AS num_unique FROM (SELECT DISTINCT o.type, w.open_to_public FROM owner o JOIN waterfall w ON o.id = w.owner_id) my_subquery; num_unique ----------- 2
MySQL and PostgreSQL support the use of the COUNT(DISTINCT)
syntax on multiple columns. The following two queries are equivalent to the preceding query, without needing a subquery:
-- MySQL equivalent SELECT COUNT(DISTINCT o.type, w.open_to_public) AS num_unique FROM owner o JOIN waterfall w ON o.id = w.owner_id; -- PostgreSQL equivalent SELECT COUNT(DISTINCT (o.type, w.open_to_public)) AS num_unique FROM owner o JOIN waterfall w ON o.id = w.owner_id; num_unique ----------- 2
The FROM Clause
The FROM
clause is used to specify the source of the data you want to retrieve. The simplest case is to name a single table or view in the FROM
clause of query.
SELECT name FROM waterfall;
You can qualify a table or view with either a database or schema name using the dot notation. The following query retrieves data specifically from the waterfall
table within the sqlbook
schema:
SELECT name FROM sqlbook.waterfall;
From Multiple Tables
Instead of retrieving data from one table, you’ll often want to pull together data from multiple tables. The most common way to do this is using a JOIN
clause within the FROM
clause. The following query retrieves data from both the waterfall
and tour
tables and displays a single results table.
SELECT * FROM waterfall w JOIN tour t ON w.id = t.stop; id name ... name stop ... ----- --------------- --------- ----- 1 Munising Falls M-28 1 1 Munising Falls Munising 1 2 Tannery Falls Munising 2 3 Alger Falls M-28 3 3 Alger Falls Munising 3 ...
Let’s break down each part of the code block.
Table aliases
waterfall w JOIN tour t
The waterfall
and tour
tables are given table aliases w
and t
, which are temporary names for the tables within the query. Table aliases are not required in a JOIN
clause, but they are very helpful for shortening table names that need to be referenced within the ON
and SELECT
clauses.
JOIN … ON …
waterfall w JOIN tour t ON w.id = t.stop
These two tables are pulled together with the JOIN
keyword. A JOIN
clause is always followed by an ON
clause, which specifies how the tables should be linked together. In this case, the id
of the waterfall in the waterfall
table must match the stop
with the waterfall in the tour
table.
Note
You may see the FROM
, JOIN
, and ON
clauses on different lines or indented. This is not required, but helpful for readability’s sake, especially when you are joining many tables together.
Results table
A query always results in a single table. The waterfall
table has 12 columns and the tour
table has 3 columns. After joining these tables together, the results table has 15 columns.
id name ... name stop ... ----- --------------- --------- ----- 1 Munising Falls M-28 1 1 Munising Falls Munising 1 2 Tannery Falls Munising 2 3 Alger Falls M-28 3 3 Alger Falls Munising 3 ...
You’ll notice that there are two columns called name
in the results table. The first is from the waterfall
table, and the
second is from the tour
table. To refer to them in the SELECT
clause, you would need to qualify the column names.
SELECT w.name, t.name FROM waterfall w JOIN tour t ON w.id = t.stop; name name --------------- --------- Munising Falls M-28 Munising Falls Munising Tannery Falls Munising ...
To differentiate the two columns, you would also want to alias the column names.
SELECT w.name AS waterfall_name, t.name AS tour_name FROM waterfall w JOIN tour t ON w.id = t.stop; waterfall_name tour_name --------------- ---------- Munising Falls M-28 Munising Falls Munising Tannery Falls Munising Alger Falls M-28 Alger Falls Munising ...
JOIN variations
In the preceding example, if a waterfall isn’t listed in any tour, then it would not appear in the results table. If you wanted to see all waterfalls in the results, you would need to use a different type of join.
There are a variety of join types used in SQL, which are covered in more detail in “Joining Tables” in Chapter 9.
From Subqueries
A subquery is a query that is nested inside another query. Subqueries within the FROM
clause should be standalone SELECT
statements, meaning that they do not reference the outer query at all and can be run on their own.
Note
A subquery within the FROM
clause is also known as a derived table because the subquery ends up essentially acting like a table for the duration of the query.
The following query lists all publicly owned waterfalls, with the subquery portion bolded.
SELECT w.name AS waterfall_name, o.name AS owner_name FROM (SELECT * FROM owner WHERE type = 'public') o JOIN waterfall w ON o.id = w.owner_id; waterfall_name owner_name --------------- --------------- Little Miners Pictured Rocks Miners Falls Pictured Rocks Munising Falls Pictured Rocks Wagner Falls MI DNR
It is important to understand the order in which the query is executed.
Step 1: Execute the subquery
The contents of the subquery are first executed. You can see that this results in a table of only public owners:
SELECT * FROM owner WHERE type = 'public'; id name phone type ----- --------------- ------------- ------- 1 Pictured Rocks 906.387.2607 public 4 MI DNR 906.228.6561 public
Going back to the original query, you’ll notice that the subquery is immediately followed by the letter o
. This is the temporary name, or alias, that we are assigning to the results of the subquery.
Note
Aliases are required for subqueries within the FROM
clause in MySQL, PostgreSQL, and SQL Server, but not in Oracle and SQLite.
Step 2: Execute the entire query
Next, you can think of the letter o
taking the place of the subquery. The query is now executed as usual.
SELECT w.name AS waterfall_name, o.name AS owner_name FROM o JOIN waterfall w ON o.id = w.owner_id; waterfall_name owner_name --------------- --------------- Little Miners Pictured Rocks Miners Falls Pictured Rocks Munising Falls Pictured Rocks Wagner Falls MI DNR
Why Use a Subquery in the FROM Clause?
The main advantage of using subqueries is that you can turn a large problem into smaller ones. Here are two examples:
- Example 1: Multiple steps to get to results
-
Let’s say you want to find the average number of stops on a tour. First, you’d have to find the number of stops on each tour, and then average the results.
The following query finds the number of stops on each tour:
SELECT name, MAX(stop) as num_stops FROM tour GROUP BY name; name num_stops --------- ---------- M-28 11 Munising 6 US-2 14
You could then turn the query into a subquery and write another query around it to find the average:
SELECT AVG(num_stops) FROM (SELECT name, MAX(stop) as num_stops FROM tour GROUP BY name) tour_stops; AVG(num_stops) ----------------- 10.3333333333333
- Example 2: Table in FROM clause is too large
-
The original goal was to list all publicly owned waterfalls. This can actually be done without a subquery and with a
JOIN
instead:SELECT w.name AS waterfall_name, o.name AS owner_name FROM owner o JOIN waterfall w ON o.id = w.owner_id WHERE o.type = 'public'; waterfall_name owner_name --------------- --------------- Little Miners Pictured Rocks Miners Falls Pictured Rocks Munising Falls Pictured Rocks Wagner Falls MI DNR
Let’s say that the query takes a really long time to run. This can happen when you join massive tables together (think tens of millions of rows). There are multiple ways you could rewrite the query to speed it up, and one of them is to use a subquery.
Since we are only interested in public owners, we can first write a subquery that filters out all of the private owners. The smaller
owner
table would then be joined with thewaterfall
table, which would take less time and produce the same results.SELECT w.name AS waterfall_name, o.name AS owner_name FROM (SELECT * FROM owner WHERE type = 'public') o JOIN waterfall w ON o.id = w.owner_id; waterfall_name owner_name --------------- --------------- Little Miners Pictured Rocks Miners Falls Pictured Rocks Munising Falls Pictured Rocks Wagner Falls MI DNR
These are just two of the many examples of how subqueries can be used to break down a larger query into smaller steps.
The WHERE Clause
The WHERE
clause is used to restrict query results to only rows of interest, or simply put, it is the place to filter data. Rarely will you want to display all rows from a table, but rather rows that match specific criteria.
Tip
When exploring a table with millions of rows, you never want to do a SELECT * FROM my_table;
because it will take an unnecessarily long time to run.
Instead, it’s a good idea to filter down the data. Two common ways to do this are:
- Filter by a column within the
WHERE
clause -
Better yet, filter by a column that is already indexed to make the retrieval even faster.
SELECT * FROM my_table WHERE year_id = 2021;
- Show the top few rows of data with the
LIMIT
clause -
(or
WHERE ROWNUM <= 10
in Oracle orSELECT TOP 10 *
in SQL Server)SELECT * FROM my_table LIMIT 10;
The following query finds all waterfalls that do not contain Falls in the name. More on the LIKE
keyword can be found in Chapter 7.
SELECT id, name FROM waterfall WHERE name NOT LIKE '%Falls%'; id name ----- ---------------- 7 Little Miners 14 Rapid River Fls
The bolded section is often referred to as a conditional statement or a predicate. The predicate makes a logical comparison for each row of data that results in TRUE
/FALSE
/UNKNOWN
.
The waterfall
table has 16 rows. For each row, it checks if
the waterfall name contains Falls or not. If it doesn’t contain Falls, then the name NOT LIKE '%Falls%'
predicate is TRUE
, and the row is returned in the results, which was the case for the two
preceding rows.
Multiple Predicates
It is also possible to combine multiple predicates with operators like AND
or OR
. The following example shows waterfalls without Falls in its name and that also don’t have an owner:
SELECT id, name FROM waterfall WHERE name NOT LIKE '%Falls%' AND owner_id IS NULL; id name ----- ---------------- 14 Rapid River Fls
More details on operators can be found in Operators in Chapter 7.
Filtering on Subqueries
A subquery is a query nested inside another query, and the WHERE
clause is a common place to find one. The following example retrieves publicly accessible waterfalls located in Alger County:
SELECT w.name FROM waterfall w WHERE w.open_to_public = 'y' AND w.county_id IN ( SELECT c.id FROM county c WHERE c.name = 'Alger'); name --------------- Munising Falls Tannery Falls Alger Falls ...
Note
Unlike subqueries within the SELECT
clause or the FROM
clause, subqueries in the WHERE
clause do not require an alias. In fact, you will get an error if you include an alias.
Why use a subquery in the WHERE clause?
The original goal was to retrieve publicly accessible waterfalls located in Alger County. If you were to write this query from scratch, you would likely start with the following:
SELECT w.name FROM waterfall w WHERE w.open_to_public = 'y';
At this point, you have all waterfalls that are publicly accessible. The final touch is to find ones that are specifically in Alger County. You know that the waterfall
table doesn’t have a county name column, but the county
table does.
You have two options to pull the county name into the results. You can either (1) write a subquery within the WHERE
clause that specifically pulls the Alger County information or (2) join together the waterfall
and county
tables:
-- Subquery in WHERE clause SELECT w.name FROM waterfall w WHERE w.open_to_public = 'y' AND w.county_id IN ( SELECT c.id FROM county c WHERE c.name = 'Alger');
or:
-- JOIN clause SELECT w.name FROM waterfall w INNER JOIN county c ON w.county_id = c.id WHERE w.open_to_public = 'y' AND c.name = 'Alger'; name --------------- Munising Falls Tannery Falls Alger Falls ...
The two queries produce the same results. The advantage of the first approach is that subqueries are often easier to understand than joins. The advantage of the second approach is that joins typically execute faster than subqueries.
Other Ways to Filter Data
The WHERE
clause is not the only place within a SELECT
statement to filter rows of data.
-
FROM
clause: When joining together tables, theON
clause specifies how they should be linked together. This is where you can include conditions to restrict rows of data returned by the query. See Joining Tables in Chapter 9 for more details. -
HAVING
clause: If there are aggregations within theSELECT
statement, theHAVING
clause is where you specify how the aggregations should be filtered. See “The HAVING Clause” for more details. -
LIMIT
clause: To display a specific number of rows, you can use the LIMIT clause. In Oracle, this is done withWHERE ROWNUM
and in SQL Server, this is done withSELECT TOP
. See “The LIMIT Clause” in this chapter for more details.
The GROUP BY Clause
The purpose of the GROUP BY
clause is to collect rows into groups and summarize the rows within the groups in some way, ultimately returning just one row per group. This is sometimes referred to as “slicing” the rows into groups and “rolling up” the rows in each group.
The following query counts the number of waterfalls along each of the tours:
SELECT t.name AS tour_name, COUNT(*) AS num_waterfalls FROM waterfall w INNER JOIN tour t ON w.id = t.stop GROUP BY t.name; tour_name num_waterfalls ---------- --------------- M-28 6 Munising 6 US-2 4
There are two parts to focus on here:
-
The collecting of rows, which is specified within the
GROUP BY
clause -
The summarizing of rows within groups, which is specified within the
SELECT
clause
Step 1: The collecting of rows
In the GROUP BY
clause:
GROUP BY t.name
we state that we would like to look at all of the rows of data and put the M-28 tour waterfalls into a group, all of the Munising tour waterfalls into a group, and so on. Behind the scenes, the data is being grouped like this:
tour_name waterfall_name ---------- ---------------- M-28 Munising Falls M-28 Alger Falls M-28 Scott Falls M-28 Canyon Falls M-28 Agate Falls M-28 Bond Falls Munising Munising Falls Munising Tannery Falls Munising Alger Falls Munising Wagner Falls Munising Horseshoe Falls Munising Miners Falls US-2 Bond Falls US-2 Fumee Falls US-2 Kakabika Falls US-2 Rapid River Fls
Step 2: The summarizing of rows
In the SELECT
clause,
SELECT t.name AS tour_name, COUNT(*) AS num_waterfalls
we state that for each group, or each tour, we want to count the number of rows of data in the group. Because each row represents a waterfall, this would result in the total number of waterfalls along each tour.
The COUNT()
function here is more formally known as an aggregate function, or a function that summarizes many rows of data into a single value. More aggregate functions can be found in “Aggregate Functions” in Chapter 7.
Warning
In this example, COUNT(*)
returns the number of waterfalls along each tour. However, this is only because each row of data in the waterfall
and tour
tables represent a single waterfall.
If a single waterfall was listed on multiple rows, COUNT(*)
would result in a larger value than expected. In this case, you could potentially use COUNT(DISTINCT waterfall_name)
instead to find the unique waterfalls. More details can be found in COUNT and DISTINCT
.
The key takeaway is that it is important to manually double-check the results of the aggregate function to make sure it is summarizing the data in the way that you intended.
Now that the groups have been created with the GROUP BY
clause, the aggregate function will be applied once to each group:
tour_name COUNT(*) ---------- --------- M-28 6 M-28 M-28 M-28 M-28 M-28 Munising 6 Munising Munising Munising Munising Munising US-2 4 US-2 US-2 US-2
Any columns to which an aggregate function has not been applied, which in this case is the tour_name
column, are now collapsed into one value:
tour_name COUNT(*) ---------- --------- M-28 6 Munising 6 US-2 4
Note
This collapsing of many detail rows into one aggregate row means that when using a GROUP BY
clause, the SELECT
clause should only contain:
-
All columns listed in the
GROUP BY
clause:t.name
-
Aggregations:
COUNT(*)
SELECT t.name AS tour_name, COUNT(*) AS num_waterfalls ... GROUP BY t.name;
Not doing so could either result in an error message or return inaccurate values.
For more complex grouping situations including ROLLUP
, CUBE
, and GROUPING SETS
, go to “Grouping and Summarizing” in Chapter 8.
The HAVING Clause
The HAVING
clause places restrictions on the rows returned from a GROUP BY
query. In other words, it allows you to filter on the results after a GROUP BY
has been applied.
Note
A HAVING
clause always immediately follows a GROUP BY
clause. Without a GROUP BY
clause, there can be no HAVING
clause.
This is a query that lists the number of waterfalls on each tour using a GROUP BY
clause:
SELECT t.name AS tour_name, COUNT(*) AS num_waterfalls FROM waterfall w INNER JOIN tour t ON w.id = t.stop GROUP BY t.name;
tour_name num_waterfalls ---------- --------------- M-28 6 Munising 6 US-2 4
Let’s say we only want to list the tours that have exactly six stops. To do so, you would add a HAVING
clause after the GROUP BY
clause:
SELECT t.name AS tour_name, COUNT(*) AS num_waterfalls FROM waterfall w INNER JOIN tour t ON w.id = t.stop GROUP BY t.name HAVING COUNT(*) = 6; tour_name num_waterfalls ---------- --------------- M-28 6 Munising 6
You’ll notice that the HAVING
clause refers to the aggregation COUNT(*)
,
SELECT COUNT(*) AS num_waterfalls ... HAVING COUNT(*) = 6;
and not the alias,
# code will not run SELECT COUNT(*) AS num_waterfalls ... HAVING num_waterfalls = 6;
The reason for this is because of the order of execution of the clauses. The SELECT
clause is written before the HAVING
clause. However, the SELECT
clause is actually executed after the HAVING
clause.
That means that the alias num_waterfalls
in the SELECT
clause does not exist at the time the HAVING
clause is being executed. The HAVING
clause must refer to the raw aggregation COUNT(*)
instead.
The ORDER BY Clause
The ORDER BY
clause is used to specify how you want the results of a query to be sorted.
The following query returns a list of owners and waterfalls, without any sorting:
SELECT COALESCE(o.name, 'Unknown') AS owner, w.name AS waterfall_name FROM waterfall w LEFT JOIN owner o ON w.owner_id = o.id; owner waterfall_name ---------------- --------------- Pictured Rocks Munising Falls Michigan Nature Tannery Falls AF LLC Alger Falls MI DNR Wagner Falls Unknown Horseshoe Falls ...
The following query returns the same list, but first sorted alphabetically by owner, and then by waterfall:
SELECT COALESCE(o.name, 'Unknown') AS owner, w.name AS waterfall_name FROM waterfall w LEFT JOIN owner o ON w.owner_id = o.id ORDER BY owner, waterfall_name; owner waterfall_name ---------------- --------------- AF LLC Alger Falls MI DNR Wagner Falls Michigan Nature Tannery Falls Michigan Nature Twin Falls #1 Michigan Nature Twin Falls #2 ...
The default sort is in ascending order, meaning text will go from A to Z and numbers will go from lowest to highest. You can use the keywords ASCENDING
and DESCENDING
(which can be abbreviated as ASC
and DESC
) to control the sort on each
column.
The following is a modification of the previous sort, but this time, it sorts owner names in reverse order:
SELECT COALESCE(o.name, 'Unknown') AS owner, w.name AS waterfall_name ... ORDER BY owner DESC, waterfall_name ASC; owner waterfall_name ---------------- --------------- Unknown Agate Falls Unknown Bond Falls Unknown Canyon Falls ...
You can sort by columns and expressions that are not in your SELECT
list:
SELECT COALESCE(o.name, 'Unknown') AS owner, w.name AS waterfall_name FROM waterfall w LEFT JOIN owner o ON w.owner_id = o.id ORDER BY o.id DESC, w.id; owner waterfall_name ---------------- --------------- MI DNR Wagner Falls AF LLC Alger Falls Michigan Nature Tannery Falls ...
You can also sort by numeric column position:
SELECT COALESCE(o.name, 'Unknown') AS owner, w.name AS waterfall_name ... ORDER BY 1 DESC, 2 ASC; owner waterfall_name ---------------- --------------- Unknown Agate Falls Unknown Bond Falls Unknown Canyon Falls ...
Because the rows of a SQL table are unordered, if you don’t include an ORDER BY
clause in a query, each time you execute the query, the results could be displayed in a different order.
The LIMIT Clause
When quickly viewing a table, it is best practice to return a limited number of rows instead of the entire table.
MySQL, PostgreSQL, and SQLite support the LIMIT
clause. Oracle and SQL Server use different syntax with the same
functionality:
-- MySQL, PostgreSQL, and SQLite SELECT * FROM owner LIMIT 3; -- Oracle SELECT * FROM owner WHERE ROWNUM <= 3; -- SQL Server SELECT TOP 3 * FROM owner; id name phone type --- ---------------- ------------- -------- 1 Pictured Rocks 906.387.2607 public 2 Michigan Nature 517.655.5655 private 3 AF LLC private
Another way to limit the number of rows returned is to filter on a column within the WHERE
clause. The filtering will execute even faster if the column is indexed.
Get SQL Pocket Guide, 4th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.