Chapter 1. Query Primer
Let’s roll up our sleeves and look at some queries using the select
statement, which is the SQL statement used to execute queries.
Query Basics
If you followed the steps listed in the Preface, then you have created a Snowflake account and have created your sample database. To enter queries using Snowsight, create a new worksheet by selecting the Worksheets button on the left side of the screen, and then click the “+” button at the top right. Your new worksheet will appear in the left pane with a name using the current date and time, but you can rename it to Learning_Snowflake_SQL
to use for the examples in this book. Before running any queries in your new worksheet, set the schema to the Public schema in the Learning_SQL database: use schema learning_sql.public
;
.
After entering this command into your worksheet, click on the right arrow at the top right of the screen to execute the command, as shown in Figure 1-1.
Every time you log in to Snowsight, you will want to execute this use schema
statement before executing queries against the sample database.
Snowflake Web Interface Versus SnowSQL CLI
For most examples shown in this book, I will be using Snowflake’s CLI, known as SnowSQL, rather than the Snowsight web interface. This allows the SQL statements in the examples to be copied from the book (assuming you are using an online copy) and also eliminates unnecessary clutter. I recommend that you use Snowsight, however, since it is a powerful tool that can be run from any device, but if you would like to use SnowSQL, you can follow these installation and usage instructions.
When using SnowSQL, always add a semicolon to the end of your SQL statement, then hit Enterto execute the statement. Also, along with setting the current schema with the use schema
command, I also set the prompt in SnowSQL using the following statement:
!set prompt_format=[schema]>;
This is why all of my examples will show the prompt PUBLIC>
, but you can use the default prompt format if you prefer.
Next, let’s begin with a simple query to return the current date:
PUBLIC>select current_date; +--------------+ | CURRENT_DATE | |--------------| | 2023-03-19 | +--------------+
This query calls the built-in function current_date()
, which by default returns the date in YYYY-MM-DD format (4-digit year, 2-digit month, 2-digit day). Select statements return a result set, which is a set of one or more rows having one or more columns. For this first example, the result set consists of a single row having a single column.
A database table also consists of one or more rows having one or more columns, so a result set returned by a select
statement could be the entire contents of a table. Here’s what that would look like using the smallest table in the sample database, Region
, which contains the following data:
PUBLIC>select * from region; +-------------+-------------+------------------------------------+ | R_REGIONKEY | R_NAME | R_COMMENT | |-------------+-------------|------------------------------------+ | 0 | AFRICA | lar deposits. blithely final pac...| | 1 | AMERICA | hs use ironic, even requests. s | | 2 | ASIA | ges. thinly even pinto beans ca | | 3 | EUROPE | ly final courts cajole furiously...| | 4 | MIDDLE EAST | uickly special accounts cajole c...| +-------------+-------------+------------------------------------+
Here are a couple of things to consider concerning this query:
select *
is shorthand for “return every column.”- This query includes both a
select
clause and afrom
clause. Afrom
clause is used to specify the table(s) from which to retrieve data.
And here are a couple of things to consider about the result set:
- The
Region
table has 5 rows and 3 columns. - The comments in the third column of the result set are randomly generated. This is true for all descriptive columns in the sample database.
- SnowSQL automatically formats the result set to include the column names, and uses the
+
,-
, and|
characters to make borders around each column and row.
If you are using Snowsight rather than SnowSQL, the result set will look something like what is shown in Figure 1-2.
If you’d like to see what columns are available in a table, you can use Snowflake’s describe
statement. Here’s what it returns for the Region
table:
PUBLIC>describe table region; +-------------+--------------+--------+-------+---------+... | name | type | kind | null? | default |... |-------------+--------------+--------+-------+---------+... | R_REGIONKEY | NUMBER(38,0) | COLUMN | N | NULL |... | R_NAME | VARCHAR(25) | COLUMN | N | NULL |... | R_COMMENT | VARCHAR(152) | COLUMN | Y | NULL |... +-------------+--------------+--------+-------+---------+...
The Region
table has a numeric column named r_regionkey
to hold a unique numeric value for each row, and two character columns (varchar
stands for variable character) to hold the region’s name and comments about the region.
Command Parameters
Many of Snowflake’s built-in commands, such as the show tables
command shown in the previous example, include multiple optional parameters affecting how the command should be executed. In this case, I used the terse
parameter to specify that only 5 of the 18 possible columns be included in the result set. If you want to see the full set of parameters available for any of Snowflake’s commands, you can peruse Snowflake’s SQL Command Reference.
To see information about all of the tables available in a database, use the show tables
command (using the terse
option to limit the number of columns):
PUBLIC>show terse tables in PUBLIC; +-------------------------------+----------+-------+... | created_on | name | kind |... |-------------------------------+----------+-------+... | 2023-02-28 06:55:21.382 -0800 | CUSTOMER | TABLE |... | 2023-02-28 06:51:41.226 -0800 | LINEITEM | TABLE |... | 2023-02-28 06:43:46.739 -0800 | NATION | TABLE |... | 2023-02-28 06:53:19.090 -0800 | ORDERS | TABLE |... | 2023-02-28 06:44:35.450 -0800 | PART | TABLE |... | 2023-02-28 06:45:20.267 -0800 | PARTSUPP | TABLE |... | 2023-02-28 06:42:32.322 -0800 | REGION | TABLE |... | 2023-02-28 06:49:39.242 -0800 | SUPPLIER | TABLE |... +-------------------------------+----------+-------+...
One thing to consider is that the output of show tables
will only include tables that you’ve been given the privilege to see, so keep this in mind when you begin working with other databases. The eight tables returned by the show tables
command will be used for most of the examples in this book. Appendix A shows a diagram of these tables, including all columns and the relationships between tables.
Query Clauses
Queries are comprised of multiple components, or clauses. Two of the clauses, select
and from
, were introduced in the previous section, but there are eight different clauses that can be added to a Snowflake query, as shown in Table 1-1.
Clause name | Purpose |
---|---|
select | Specifies the columns to be included in the result set |
from | Identifies the tables from which to retrieve data and how the tables should be joined |
where | Removes unwanted rows in the result set |
group by | Groups rows together by common values |
having | Removes unwanted rows in the result set based on groupings |
qualify | Removes unwanted rows in the result set based on results of windowing functions (see Chapter 14) |
order by | Sorts the result set by one or more columns |
limit | Restricts the number of rows in the result set |
Of the eight clauses listed in Table 1-1, only select
is required, and some are used in tandem (for example, you wouldn’t use the having
clause without first specifying a group by
clause). All of these clauses are covered in this book, with some of the more advanced ones left for later chapters. The following sections outline the uses of each of these clauses.
The select Clause
As demonstrated earlier in the chapter, a query can consist of just a select
clause. Here’s an example that returns a result set consisting of a single column:
PUBLIC>select 'Welcome to Snowflake SQL!'; +-----------------------------+ | 'WELCOME TO SNOWFLAKE SQL!' | |-----------------------------| | Welcome to Snowflake SQL! | +-----------------------------+
The select
clause is the only required clause in a query, but it isn’t particularly useful by itself. To make things more interesting, let’s write a query to retrieve some of the columns from the Nation
table, which looks as follows:
PUBLIC>describe table nation; +-------------+--------------+--------+-------+---------+... | name | type | kind | null? | default |... |-------------+--------------+--------+-------+---------+... | N_NATIONKEY | NUMBER(38,0) | COLUMN | N | NULL |... | N_NAME | VARCHAR(25) | COLUMN | N | NULL |... | N_REGIONKEY | NUMBER(38,0) | COLUMN | N | NULL |... | N_COMMENT | VARCHAR(152) | COLUMN | Y | NULL |... +-------------+--------------+--------+-------+---------+...
The next query retrieves the first three columns from Nation
:
PUBLIC>select n_nationkey, n_name, n_regionkey from nation; +-------------+----------------+-------------+ | N_NATIONKEY | N_NAME | N_REGIONKEY | |-------------+----------------+-------------| | 0 | ALGERIA | 0 | | 1 | ARGENTINA | 1 | | 2 | BRAZIL | 1 | | 3 | CANADA | 1 | | 4 | EGYPT | 4 | | 5 | ETHIOPIA | 0 | | 6 | FRANCE | 3 | | 7 | GERMANY | 3 | | 8 | INDIA | 2 | | 9 | INDONESIA | 2 | | 10 | IRAN | 4 | | 11 | IRAQ | 4 | | 12 | JAPAN | 2 | | 13 | JORDAN | 4 | | 14 | KENYA | 0 | | 15 | MOROCCO | 0 | | 16 | MOZAMBIQUE | 0 | | 17 | PERU | 1 | | 18 | CHINA | 2 | | 19 | ROMANIA | 3 | | 20 | SAUDI ARABIA | 4 | | 21 | VIETNAM | 2 | | 22 | RUSSIA | 3 | | 23 | UNITED KINGDOM | 3 | | 24 | UNITED STATES | 1 | +-------------+----------------+-------------+
There are four columns in the Nation
table, but the query retrieves just three of them. The purpose of the select
clause in this query, therefore, is to specify which of all possible columns should be included in the result set. A select
clause isn’t just limited to columns in a table, however, but can contain any of the following:
- Literals, such as the number
99
or the string'Welcome to Snowflake SQL!'
- Expressions, such as
n_nationkey * 100
- Built-in function calls, such as
concat
(n_nationkey
, ' : ',n_name
) - User-defined function calls built with Java, Python, JavaScript, or the Snowflake Scripting language
Built-in and user-defined functions will be covered in detail in later chapters.
Column aliases
While the database server will assign column names for you, you may want to specify your own names by using column aliases. This is especially helpful for any literals, expressions, or function calls included in your select
clause. Here’s an example with the column aliases shown in bold:
PUBLIC>select 'Welcome to Snowflake SQL!' as welcome_message, 5 * 3.1415927 as circle_circumference, dayname(current_date) as day_of_week; +---------------------------+----------------------+-------------+ | WELCOME_MESSAGE | CIRCLE_CIRCUMFERENCE | DAY_OF_WEEK | |---------------------------+----------------------+-------------| | Welcome to Snowflake SQL! | 15.7079635 | Fri | +---------------------------+----------------------+-------------+
Column aliases may be preceded by the as
keyword, which improves readability.
Removing duplicates
In some cases, a query may return duplicate values. For example, the following query returns the values in the n_regionkey
column of the Nation
table:
PUBLIC>select n_regionkey from nation; +-------------+ | N_REGIONKEY | |-------------| | 0 | | 1 | | 1 | | 1 | | 4 | | 0 | | 3 | | 3 | | 2 | | 2 | | 4 | | 4 | | 2 | | 4 | | 0 | | 0 | | 0 | | 1 | | 2 | | 3 | | 4 | | 2 | | 3 | | 3 | | 1 | +-------------+
As you can see, there are only five different values for n_regionkey
across the 25 rows of the Nation
table. If you wanted to retrieve only the unique set of values in the n_regionkey
column, you can use the distinct
keyword:
PUBLIC>select distinct n_regionkey from nation; +-------------+ | N_REGIONKEY | |-------------| | 0 | | 1 | | 3 | | 2 | | 4 | +-------------+
Adding distinct
to your select
clause will instruct the server to sort the values and remove any duplicates.
The from Clause
Earlier in this chapter, examples retrieve data from a single table, but the from
clause can reference multiple tables. When there are two or more tables in the from
clause, its role expands to include not just a list of tables, but the means by which the tables should be linked. To illustrate, let’s say you want to retrieve data from the Nation
table, but rather than retrieving the n_regionkey
column, you want to use the n_regionkey
value to retrieve the region’s name from the Region
table. Here’s what that query would look like:
PUBLIC>select n_nationkey, n_name as nation_name, r_name as region_name from nation join region on nation.n_regionkey = region.r_regionkey; +-------------+----------------+-------------+ | N_NATIONKEY | NATION_NAME | REGION_NAME | |-------------+----------------+-------------| | 0 | ALGERIA | AFRICA | | 1 | ARGENTINA | AMERICA | | 2 | BRAZIL | AMERICA | | 3 | CANADA | AMERICA | | 4 | EGYPT | MIDDLE EAST | | 5 | ETHIOPIA | AFRICA | | 6 | FRANCE | EUROPE | | 7 | GERMANY | EUROPE | | 8 | INDIA | ASIA | | 9 | INDONESIA | ASIA | | 10 | IRAN | MIDDLE EAST | | 11 | IRAQ | MIDDLE EAST | | 12 | JAPAN | ASIA | | 13 | JORDAN | MIDDLE EAST | | 14 | KENYA | AFRICA | | 15 | MOROCCO | AFRICA | | 16 | MOZAMBIQUE | AFRICA | | 17 | PERU | AMERICA | | 18 | CHINA | ASIA | | 19 | ROMANIA | EUROPE | | 20 | SAUDI ARABIA | MIDDLE EAST | | 21 | VIETNAM | ASIA | | 22 | RUSSIA | EUROPE | | 23 | UNITED KINGDOM | EUROPE | | 24 | UNITED STATES | AMERICA | +-------------+----------------+-------------+
The from
clause contains two tables, Nation
and Region
, and also includes the on
subclause to specify that the regionkey
columns (n_regionkey
in the Nation
table and r_regionkey
in the Region
table) should be used to join the tables. In simpler terms, the query specifies that for every row in the Nation
table, use the n_regionkey
value to look up the region’s name in the Region
table. Multitable joins are covered extensively in Chapter 3.
The where Clause
While there are cases when you want to retrieve all rows from a table, it is common to retrieve only a subset of the rows, which is the job of the where
clause. For example, you may only want to retrieve rows from the Nation
table that have a name starting with the letter U:
PUBLIC>select n_name from nation where n_name like 'U%'; +----------------+ | N_NAME | |----------------| | UNITED KINGDOM | | UNITED STATES | +----------------+
The job of the where
clause is filtering, or the removal of unwanted rows. Each element in the where
clause is called a condition, which for the previous example is:
n_name like 'U%'
There can be multiple conditions in a where
clause, as shown by the next example, which retrieves the names of nations starting with either U or A:
PUBLIC>select n_name from nation where n_name like 'U%' or n_name like 'A%'; +----------------+ | N_NAME | |----------------| | ALGERIA | | ARGENTINA | | UNITED KINGDOM | | UNITED STATES | +----------------+
The previous query includes two conditions, separated by or
, which specifies that any row satisfying either condition should be included in the result set. Chapter 2 covers filtering in detail.
The group by Clause
As suggested by the name, the group by
clause is used for grouping rows. Grouping rows of data is a very common practice, especially for reporting and data analysis. To illustrate, the next example counts the number of countries in the Nation
table for each row in the Region
table:
PUBLIC>select r_name as region_name, count(*) as number_of_countries from nation join region on nation.n_regionkey = region.r_regionkey group by r_name; +-------------+---------------------+ | REGION_NAME | NUMBER_OF_COUNTRIES | |-------------+---------------------| | AFRICA | 5 | | AMERICA | 5 | | MIDDLE EAST | 5 | | EUROPE | 5 | | ASIA | 5 | +-------------+---------------------+
This query groups the rows in Nation
by their regions, and then uses the built-in count()
function to determine the number of nations in each region. Chapter 7 covers grouping.
The having Clause
I introduced the concept of filtering when discussing the role of the where
clause (see “The where Clause”), but there are actually three different query clauses that play this role. The second one is the having
clause, but its role is to filter rows based on grouped data. Let’s use another table, Supplier
, to demonstrate how this works. Here’s the definition of Supplier
:
PUBLIC>describe table supplier; +-------------+--------------+--------+-------+---------+... | name | type | kind | null? | default |... |-------------+--------------+--------+-------+---------+... | S_SUPPKEY | NUMBER(38,0) | COLUMN | N | NULL |... | S_NAME | VARCHAR(25) | COLUMN | N | NULL |... | S_ADDRESS | VARCHAR(40) | COLUMN | N | NULL |... | S_NATIONKEY | NUMBER(38,0) | COLUMN | N | NULL |... | S_PHONE | VARCHAR(15) | COLUMN | N | NULL |... | S_ACCTBAL | NUMBER(12,2) | COLUMN | N | NULL |... | S_COMMENT | VARCHAR(101) | COLUMN | Y | NULL |... +-------------+--------------+--------+-------+---------+...
The Supplier
table includes the s_nationkey
column, which is a link to the primary key of the Nation
table (n_nationkey
). The next query counts up the number of suppliers in each nation:
PUBLIC>select n_name as nation_name, count(*) as number_of_suppliers from supplier join nation on supplier.s_nationkey = nation.n_nationkey group by n_name; +----------------+---------------------+ | NATION_NAME | NUMBER_OF_SUPPLIERS | |----------------+---------------------| | PERU | 421 | | ETHIOPIA | 380 | | ARGENTINA | 413 | | MOROCCO | 373 | | IRAQ | 438 | | UNITED KINGDOM | 390 | | UNITED STATES | 393 | | CANADA | 412 | | RUSSIA | 401 | | ROMANIA | 398 | | BRAZIL | 397 | | EGYPT | 415 | | INDONESIA | 405 | | ALGERIA | 420 | | VIETNAM | 399 | | JORDAN | 362 | | JAPAN | 377 | | SAUDI ARABIA | 411 | | KENYA | 376 | | CHINA | 407 | | GERMANY | 396 | | FRANCE | 402 | | IRAN | 393 | | INDIA | 415 | | MOZAMBIQUE | 406 | +----------------+---------------------+
There are suppliers in each of the 25 nations, with Jordan having the fewest suppliers (362), and Iraq having the most (438). If you want to retrieve only those nations having more than 400 suppliers, you could add a having
clause containing a filter condition on the results of the count()
function. Here’s what that would look like:
PUBLIC>select n_name as nation_name, count(*) as number_of_suppliers from supplier join nation on supplier.s_nationkey = nation.n_nationkey group by n_name having count(*) > 400; +--------------+---------------------+ | NATION_NAME | NUMBER_OF_SUPPLIERS | |--------------+---------------------| | PERU | 421 | | ARGENTINA | 413 | | IRAQ | 438 | | CANADA | 412 | | RUSSIA | 401 | | EGYPT | 415 | | INDONESIA | 405 | | ALGERIA | 420 | | SAUDI ARABIA | 411 | | CHINA | 407 | | FRANCE | 402 | | INDIA | 415 | | MOZAMBIQUE | 406 | +--------------+---------------------+
You can have both where
and having
clauses in the same query, but any conditions in the where
clause are evaluated prior to grouping rows, whereas the conditions in the having
clause are evaluated after the rows have been grouped. Here’s an example with multiple filters:
PUBLIC>select n_name as nation_name, count(*) as number_of_suppliers from supplier join nation on supplier.s_nationkey = nation.n_nationkey where n_name like '%A' group by n_name having count(*) > 400; +--------------+---------------------+ | NATION_NAME | NUMBER_OF_SUPPLIERS | |--------------+---------------------| | ARGENTINA | 413 | | CHINA | 407 | | INDIA | 415 | | CANADA | 412 | | RUSSIA | 401 | | INDONESIA | 405 | | SAUDI ARABIA | 411 | | ALGERIA | 420 | +--------------+---------------------+
This query first finds suppliers in nations whose name ends in A, sums the number of suppliers for each nation, and then discards any rows for which the number of suppliers is less than or equal to 400. The having
clause is covered along with the group by
clause in Chapter 7.
The qualify Clause
So far, you’ve seen two different clauses used for filtering; where
and having
. The third clause used for filtering is qualify
, but it is a special-purpose clause used only for filtering rows based on the results of windowing functions, which are used for multiple purposes including assigning rankings. To illustrate, the next example assigns a rank to every row in the Nation
table based on the number of characters in the name, using the built-in length()
function:
PUBLIC>select n_name, rank() over (order by length(n_name) desc) as length_rank from nation; +----------------+-------------+ | N_NAME | LENGTH_RANK | |----------------+-------------| | UNITED KINGDOM | 1 | | UNITED STATES | 2 | | SAUDI ARABIA | 3 | | MOZAMBIQUE | 4 | | ARGENTINA | 5 | | INDONESIA | 5 | | ETHIOPIA | 7 | | MOROCCO | 8 | | ALGERIA | 8 | | VIETNAM | 8 | | ROMANIA | 8 | | GERMANY | 8 | | JORDAN | 13 | | FRANCE | 13 | | CANADA | 13 | | RUSSIA | 13 | | BRAZIL | 13 | | JAPAN | 18 | | KENYA | 18 | | EGYPT | 18 | | CHINA | 18 | | INDIA | 18 | | IRAQ | 23 | | PERU | 23 | | IRAN | 23 | +----------------+-------------+
The rank()
function is used to generate a ranking for each row, yielding the top ranking to United Kingdom with 14 characters, and the lowest rankings for Iraq, Peru, and Iran, each having only 4 characters. Without going into much detail on the rank()
function (which is covered in Chapter 14), let’s modify the previous query to return only those rows with a ranking of 5 or less by adding a qualify
clause:
PUBLIC>select n_name, rank() over (order by length(n_name) desc) as length_rank from nation qualify length_rank <= 5; +----------------+-------------+ | N_NAME | LENGTH_RANK | |----------------+-------------| | UNITED KINGDOM | 1 | | UNITED STATES | 2 | | SAUDI ARABIA | 3 | | MOZAMBIQUE | 4 | | ARGENTINA | 5 | | INDONESIA | 5 | +----------------+-------------+
This version of the query returns 6 rows since both Argentina and Indonesia tie for 5th place with 9 characters each. The qualify
clause is covered in Chapter 14.
The order by Clause
In general, result sets retuned by queries are not in any particular order. If you want your results to be sorted, such as alphabetically, numerically, or chronologically, you can add an order by
clause at the end of your query. The order by
clause can include one or more of the elements in your select
clause and can reference each either by name or position.
The Supplier
table includes the s_acctbal
column, which presumably contains an amount owed to each supplier. The next example sorts the results by s_acctbal
in descending order:
PUBLIC>select s_name, s_acctbal from supplier order by s_acctbal desc; +--------------------+-----------+ | S_NAME | S_ACCTBAL | |--------------------+-----------| | Supplier#000006343 | 9998.20 | | Supplier#000002522 | 9997.04 | | Supplier#000000892 | 9993.46 | | Supplier#000002543 | 9992.70 | | Supplier#000001833 | 9992.26 | | Supplier#000009966 | 9991.00 | | Supplier#000002892 | 9989.02 | | Supplier#000008875 | 9984.69 | | Supplier#000002331 | 9984.20 | | Supplier#000007895 | 9977.32 | ... <7,380 rows omitted> | Supplier#000003627 | -986.14 | | Supplier#000001907 | -987.45 | | Supplier#000001654 | -988.37 | | Supplier#000001870 | -989.05 | | Supplier#000008224 | -989.86 | | Supplier#000001764 | -990.13 | | Supplier#000005298 | -990.16 | | Supplier#000008927 | -995.53 | | Supplier#000007259 | -997.61 | | Supplier#000009795 | -998.22 | +--------------------+-----------+
The Supplier
table has 7,400 rows, so I omitted most of them, but you can see how the s_acctbal
value decreases over the rows in the result set. The column used for sorting could also be specified by its position in the select
clause, which in this case would be 2:
PUBLIC>select s_name, s_acctbal from supplier order by 2 desc; +--------------------+-----------+ | S_NAME | S_ACCTBAL | |--------------------+-----------| | Supplier#000006343 | 9998.20 | | Supplier#000002522 | 9997.04 | | Supplier#000000892 | 9993.46 | | Supplier#000002543 | 9992.70 | | Supplier#000001833 | 9992.26 | | Supplier#000009966 | 9991.00 | | Supplier#000002892 | 9989.02 | | Supplier#000008875 | 9984.69 | | Supplier#000002331 | 9984.20 | | Supplier#000007895 | 9977.32 | ... <7,390 rows omitted> +--------------------+-----------+
You can specify the sort order as either descending (desc
) or ascending (asc
). While there is no chapter specifically for the order by
clause, you will see it used in various examples in this book.
The limit Clause
The result set from the previous example contained 7,400 rows; wouldn’t it be great if you could specify that only a subset be returned? This is where the limit
clause comes in, and it allows you to specify how many rows to return, starting either at the first row or at a specified offset. Here’s the query from the previous section, but with a limit
clause to specify that only the first 10 rows be returned:
PUBLIC>select s_name, s_acctbal from supplier order by s_acctbal desc limit 10; +--------------------+-----------+ | S_NAME | S_ACCTBAL | |--------------------+-----------| | Supplier#000006343 | 9998.20 | | Supplier#000002522 | 9997.04 | | Supplier#000000892 | 9993.46 | | Supplier#000002543 | 9992.70 | | Supplier#000001833 | 9992.26 | | Supplier#000009966 | 9991.00 | | Supplier#000002892 | 9989.02 | | Supplier#000008875 | 9984.69 | | Supplier#000002331 | 9984.20 | | Supplier#000007895 | 9977.32 | +--------------------+-----------+
In this example, the server sorts all 7,400 rows as specified by the order by
clause, and then returns only the first 10. You can also use the optional offset
subclause to tell the server to start at a particular row. Since I know that there are 7,400 rows in the Supplier
table, I can specify an offset of 7,390 to see the last 10 rows in the result set:
PUBLIC>select s_name, s_acctbal from supplier order by s_acctbal desc limit 10 offset 7390; +--------------------+-----------+ | S_NAME | S_ACCTBAL | |--------------------+-----------| | Supplier#000003627 | -986.14 | | Supplier#000001907 | -987.45 | | Supplier#000001654 | -988.37 | | Supplier#000001870 | -989.05 | | Supplier#000008224 | -989.86 | | Supplier#000001764 | -990.13 | | Supplier#000005298 | -990.16 | | Supplier#000008927 | -995.53 | | Supplier#000007259 | -997.61 | | Supplier#000009795 | -998.22 | +--------------------+-----------+
While it would be great if you could specify limit -10
to see the last 10 rows in the result set without knowing the total number of rows, that functionality isn’t available at this time. However, you can always sort in the opposite direction and choose the first 10 rows:
PUBLIC>select s_name, s_acctbal from supplier order by s_acctbal asc limit 10; +--------------------+-----------+ | S_NAME | S_ACCTBAL | |--------------------+-----------| | Supplier#000009795 | -998.22 | | Supplier#000007259 | -997.61 | | Supplier#000008927 | -995.53 | | Supplier#000005298 | -990.16 | | Supplier#000001764 | -990.13 | | Supplier#000008224 | -989.86 | | Supplier#000001870 | -989.05 | | Supplier#000001654 | -988.37 | | Supplier#000001907 | -987.45 | | Supplier#000003627 | -986.14 | +--------------------+-----------+
Wrap-Up
The select
statement certainly seems simple at first, but as you combine more and more of the eight clauses demonstrated in this chapter, queries can become quite sophisticated. Things become even more interesting when you start including subqueries (Chapter 8), conditional logic (Chapter 10), and window functions (Chapter 14). Chapters 5 and 6 introduce some of the other SQL data statements, such as insert
and update
, some of which will utilize some of the same clauses used to construct select
statements.
Test Your Knowledge
The following exercises are designed to strengthen your understanding of the select
statement. Please see “Chapter 1” in Appendix B for solutions.
Exercise 1-1
Write a query to retrieve the n_nationkey
and n_name
columns from the Nation
table. Sort the rows by n_name
values.
Exercise 1-2
Write a query to retrieve the n_nationkey
and n_name
columns from the Nation
table, but only for those rows with a value of 3
for n_regionkey
.
Exercise 1-3
Write a query to retrieve the n_nationkey
and n_name
columns from the Nation
table and join to the Region
table (using the r_regionkey
column) to retrieve only those nations belonging to the Africa region (r_name
= 'AFRICA'
).
Exercise 1-4
Retrieve the s_name
and s_acctbal
columns from the Supplier
table. Sort by s_acctbal
in descending order and retrieve only the first 10 rows (which will be the 10 suppliers with the highest s_acctbal
values).
Get Learning Snowflake SQL and Scripting now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.