Chapter 1. Query Primer

Let’s roll up our sleeves and look at some queries using the select statement, which is the SQL statement used to execute queries.

Query Basics

If you followed the steps listed in the Preface, then you have created a Snowflake account and have created your sample database. To enter queries using Snowsight, create a new worksheet by selecting the Worksheets button on the left side of the screen, and then click the “+” button at the top right. Your new worksheet will appear in the left pane with a name using the current date and time, but you can rename it to Learning_Snowflake_SQL to use for the examples in this book. Before running any queries in your new worksheet, set the schema to the Public schema in the Learning_SQL database: use schema learning_sql.public;.

After entering this command into your worksheet, click on the right arrow at the top right of the screen to execute the command, as shown in Figure 1-1.

Every time you log in to Snowsight, you will want to execute this use schema statement before executing queries against the sample database.

Snowflake Web Interface Versus SnowSQL CLI

For most examples shown in this book, I will be using Snowflake’s CLI, known as SnowSQL, rather than the Snowsight web interface. This allows the SQL statements in the examples to be copied from the book (assuming you are using an online copy) and also eliminates unnecessary clutter. I recommend that you use Snowsight, however, since it is a powerful tool that can be run from any device, but if you would like to use SnowSQL, you can follow these installation and usage instructions.

When using SnowSQL, always add a semicolon to the end of your SQL statement, then hit Enterto execute the statement. Also, along with setting the current schema with the use schema command, I also set the prompt in SnowSQL using the following statement:

!set prompt_format=[schema]>;

This is why all of my examples will show the prompt PUBLIC>, but you can use the default prompt format if you prefer.

Next, let’s begin with a simple query to return the current date:

PUBLIC>select current_date;
+--------------+
| CURRENT_DATE |
|--------------|
| 2023-03-19   |
+--------------+

This query calls the built-in function current_date(), which by default returns the date in YYYY-MM-DD format (4-digit year, 2-digit month, 2-digit day). Select statements return a result set, which is a set of one or more rows having one or more columns. For this first example, the result set consists of a single row having a single column.

A database table also consists of one or more rows having one or more columns, so a result set returned by a select statement could be the entire contents of a table. Here’s what that would look like using the smallest table in the sample database, Region, which contains the following data:

PUBLIC>select *
       from region;
+-------------+-------------+------------------------------------+
| R_REGIONKEY | R_NAME      | R_COMMENT                          |
|-------------+-------------|------------------------------------+
|           0 | AFRICA      | lar deposits. blithely final pac...|
|           1 | AMERICA     | hs use ironic, even requests. s    |
|           2 | ASIA        | ges. thinly even pinto beans ca    |
|           3 | EUROPE      | ly final courts cajole furiously...|
|           4 | MIDDLE EAST | uickly special accounts cajole c...|
+-------------+-------------+------------------------------------+

Here are a couple of things to consider concerning this query:

select * is shorthand for “return every column.”
This query includes both a select clause and a from clause. A from clause is used to specify the table(s) from which to retrieve data.

And here are a couple of things to consider about the result set:

The Region table has 5 rows and 3 columns.
The comments in the third column of the result set are randomly generated. This is true for all descriptive columns in the sample database.
SnowSQL automatically formats the result set to include the column names, and uses the +, -, and | characters to make borders around each column and row.

If you are using Snowsight rather than SnowSQL, the result set will look something like what is shown in Figure 1-2.

If you’d like to see what columns are available in a table, you can use Snowflake’s describe statement. Here’s what it returns for the Region table:

PUBLIC>describe table region;
+-------------+--------------+--------+-------+---------+...
| name        | type         | kind   | null? | default |...
|-------------+--------------+--------+-------+---------+...
| R_REGIONKEY | NUMBER(38,0) | COLUMN | N     | NULL    |...
| R_NAME      | VARCHAR(25)  | COLUMN | N     | NULL    |...
| R_COMMENT   | VARCHAR(152) | COLUMN | Y     | NULL    |...
+-------------+--------------+--------+-------+---------+...

The Region table has a numeric column named r_regionkey to hold a unique numeric value for each row, and two character columns (varchar stands for variable character) to hold the region’s name and comments about the region.

Command Parameters

Many of Snowflake’s built-in commands, such as the show tables command shown in the previous example, include multiple optional parameters affecting how the command should be executed. In this case, I used the terse parameter to specify that only 5 of the 18 possible columns be included in the result set. If you want to see the full set of parameters available for any of Snowflake’s commands, you can peruse Snowflake’s SQL Command Reference.

To see information about all of the tables available in a database, use the show tables command (using the terse option to limit the number of columns):

PUBLIC>show terse tables in PUBLIC;
+-------------------------------+----------+-------+...
| created_on                    | name     | kind  |...
|-------------------------------+----------+-------+...
| 2023-02-28 06:55:21.382 -0800 | CUSTOMER | TABLE |...
| 2023-02-28 06:51:41.226 -0800 | LINEITEM | TABLE |...
| 2023-02-28 06:43:46.739 -0800 | NATION   | TABLE |...
| 2023-02-28 06:53:19.090 -0800 | ORDERS   | TABLE |...
| 2023-02-28 06:44:35.450 -0800 | PART     | TABLE |...
| 2023-02-28 06:45:20.267 -0800 | PARTSUPP | TABLE |...
| 2023-02-28 06:42:32.322 -0800 | REGION   | TABLE |...
| 2023-02-28 06:49:39.242 -0800 | SUPPLIER | TABLE |...
+-------------------------------+----------+-------+...

One thing to consider is that the output of show tables will only include tables that you’ve been given the privilege to see, so keep this in mind when you begin working with other databases. The eight tables returned by the show tables command will be used for most of the examples in this book. Appendix A shows a diagram of these tables, including all columns and the relationships between tables.

Query Clauses

Queries are comprised of multiple components, or clauses. Two of the clauses, select and from, were introduced in the previous section, but there are eight different clauses that can be added to a Snowflake query, as shown in Table 1-1.

Table 1-1. Query clauses
Clause name	Purpose
select	Specifies the columns to be included in the result set
from	Identifies the tables from which to retrieve data and how the tables should be joined
where	Removes unwanted rows in the result set
group by	Groups rows together by common values
having	Removes unwanted rows in the result set based on groupings
qualify	Removes unwanted rows in the result set based on results of windowing functions (see Chapter 14)
order by	Sorts the result set by one or more columns
limit	Restricts the number of rows in the result set

Of the eight clauses listed in Table 1-1, only select is required, and some are used in tandem (for example, you wouldn’t use the having clause without first specifying a group by clause). All of these clauses are covered in this book, with some of the more advanced ones left for later chapters. The following sections outline the uses of each of these clauses.

The select Clause

As demonstrated earlier in the chapter, a query can consist of just a select clause. Here’s an example that returns a result set consisting of a single column:

PUBLIC>select 'Welcome to Snowflake SQL!';
+-----------------------------+
| 'WELCOME TO SNOWFLAKE SQL!' |
|-----------------------------|
| Welcome to Snowflake SQL!   |
+-----------------------------+

The select clause is the only required clause in a query, but it isn’t particularly useful by itself. To make things more interesting, let’s write a query to retrieve some of the columns from the Nation table, which looks as follows:

PUBLIC>describe table nation;
+-------------+--------------+--------+-------+---------+...
| name        | type         | kind   | null? | default |...
|-------------+--------------+--------+-------+---------+...
| N_NATIONKEY | NUMBER(38,0) | COLUMN | N     | NULL    |...
| N_NAME      | VARCHAR(25)  | COLUMN | N     | NULL    |...
| N_REGIONKEY | NUMBER(38,0) | COLUMN | N     | NULL    |...
| N_COMMENT   | VARCHAR(152) | COLUMN | Y     | NULL    |...
+-------------+--------------+--------+-------+---------+...

The next query retrieves the first three columns from Nation:

PUBLIC>select n_nationkey, n_name, n_regionkey
       from nation;
+-------------+----------------+-------------+
| N_NATIONKEY | N_NAME         | N_REGIONKEY |
|-------------+----------------+-------------|
|           0 | ALGERIA        |           0 |
|           1 | ARGENTINA      |           1 |
|           2 | BRAZIL         |           1 |
|           3 | CANADA         |           1 |
|           4 | EGYPT          |           4 |
|           5 | ETHIOPIA       |           0 |
|           6 | FRANCE         |           3 |
|           7 | GERMANY        |           3 |
|           8 | INDIA          |           2 |
|           9 | INDONESIA      |           2 |
|          10 | IRAN           |           4 |
|          11 | IRAQ           |           4 |
|          12 | JAPAN          |           2 |
|          13 | JORDAN         |           4 |
|          14 | KENYA          |           0 |
|          15 | MOROCCO        |           0 |
|          16 | MOZAMBIQUE     |           0 |
|          17 | PERU           |           1 |
|          18 | CHINA          |           2 |
|          19 | ROMANIA        |           3 |
|          20 | SAUDI ARABIA   |           4 |
|          21 | VIETNAM        |           2 |
|          22 | RUSSIA         |           3 |
|          23 | UNITED KINGDOM |           3 |
|          24 | UNITED STATES  |           1 |
+-------------+----------------+-------------+

There are four columns in the Nation table, but the query retrieves just three of them. The purpose of the select clause in this query, therefore, is to specify which of all possible columns should be included in the result set. A select clause isn’t just limited to columns in a table, however, but can contain any of the following:

Literals, such as the number 99 or the string 'Welcome to Snowflake SQL!'
Expressions, such as n_nationkey * 100
Built-in function calls, such as concat(n_nationkey, ' : ', n_name)
User-defined function calls built with Java, Python, JavaScript, or the Snowflake Scripting language

Built-in and user-defined functions will be covered in detail in later chapters.

Column aliases

While the database server will assign column names for you, you may want to specify your own names by using column aliases. This is especially helpful for any literals, expressions, or function calls included in your select clause. Here’s an example with the column aliases shown in bold:

PUBLIC>select 'Welcome to Snowflake SQL!' as welcome_message,
         5 * 3.1415927 as circle_circumference,
         dayname(current_date) as day_of_week;
+---------------------------+----------------------+-------------+
| WELCOME_MESSAGE           | CIRCLE_CIRCUMFERENCE | DAY_OF_WEEK |
|---------------------------+----------------------+-------------|
| Welcome to Snowflake SQL! |           15.7079635 | Fri         |
+---------------------------+----------------------+-------------+

Column aliases may be preceded by the as keyword, which improves readability.

Removing duplicates

In some cases, a query may return duplicate values. For example, the following query returns the values in the n_regionkey column of the Nation table:

PUBLIC>select n_regionkey from nation;
+-------------+
| N_REGIONKEY |
|-------------|
|           0 |
|           1 |
|           1 |
|           1 |
|           4 |
|           0 |
|           3 |
|           3 |
|           2 |
|           2 |
|           4 |
|           4 |
|           2 |
|           4 |
|           0 |
|           0 |
|           0 |
|           1 |
|           2 |
|           3 |
|           4 |
|           2 |
|           3 |
|           3 |
|           1 |
+-------------+

As you can see, there are only five different values for n_regionkey across the 25 rows of the Nation table. If you wanted to retrieve only the unique set of values in the n_regionkey column, you can use the distinct keyword:

PUBLIC>select distinct n_regionkey from nation;
+-------------+
| N_REGIONKEY |
|-------------|
|           0 |
|           1 |
|           3 |
|           2 |
|           4 |
+-------------+

Adding distinct to your select clause will instruct the server to sort the values and remove any duplicates.

The from Clause

Earlier in this chapter, examples retrieve data from a single table, but the from clause can reference multiple tables. When there are two or more tables in the from clause, its role expands to include not just a list of tables, but the means by which the tables should be linked. To illustrate, let’s say you want to retrieve data from the Nation table, but rather than retrieving the n_regionkey column, you want to use the n_regionkey value to retrieve the region’s name from the Region table. Here’s what that query would look like:

PUBLIC>select n_nationkey, n_name as nation_name, 
         r_name as region_name
       from nation join region
       on nation.n_regionkey = region.r_regionkey;
+-------------+----------------+-------------+
| N_NATIONKEY | NATION_NAME    | REGION_NAME |
|-------------+----------------+-------------|
|           0 | ALGERIA        | AFRICA      |
|           1 | ARGENTINA      | AMERICA     |
|           2 | BRAZIL         | AMERICA     |
|           3 | CANADA         | AMERICA     |
|           4 | EGYPT          | MIDDLE EAST |
|           5 | ETHIOPIA       | AFRICA      |
|           6 | FRANCE         | EUROPE      |
|           7 | GERMANY        | EUROPE      |
|           8 | INDIA          | ASIA        |
|           9 | INDONESIA      | ASIA        |
|          10 | IRAN           | MIDDLE EAST |
|          11 | IRAQ           | MIDDLE EAST |
|          12 | JAPAN          | ASIA        |
|          13 | JORDAN         | MIDDLE EAST |
|          14 | KENYA          | AFRICA      |
|          15 | MOROCCO        | AFRICA      |
|          16 | MOZAMBIQUE     | AFRICA      |
|          17 | PERU           | AMERICA     |
|          18 | CHINA          | ASIA        |
|          19 | ROMANIA        | EUROPE      |
|          20 | SAUDI ARABIA   | MIDDLE EAST |
|          21 | VIETNAM        | ASIA        |
|          22 | RUSSIA         | EUROPE      |
|          23 | UNITED KINGDOM | EUROPE      |
|          24 | UNITED STATES  | AMERICA     |
+-------------+----------------+-------------+

The from clause contains two tables, Nation and Region, and also includes the on subclause to specify that the regionkey columns (n_regionkey in the Nation table and r_regionkey in the Region table) should be used to join the tables. In simpler terms, the query specifies that for every row in the Nation table, use the n_regionkey value to look up the region’s name in the Region table. Multitable joins are covered extensively in Chapter 3.

Generating Data Sets Using Values

Along with retrieving data from tables, Snowflake allows you to generate data on the fly using the values subclause of the from clause. This can be very handy for creating small data sets that aren’t found in any of your tables. For example, if you need a list of month names and numbers, but there are no tables in your database containing calendar information, you could use the following query:

PUBLIC> select * 
        from (values 
         ('JAN',1),
         ('FEB',2),
         ('MAR',3),
         ('APR',4),
         ('MAY',5),
         ('JUN',6),
         ('JUL',7),
         ('AUG',8),
         ('SEP',9),
         ('OCT',10),
         ('NOV',11),
         ('DEC',12)) 
        as months (month_name, month_num);
+------------+-----------+
| MONTH_NAME | MONTH_NUM |
|------------+-----------|
| JAN        |         1 |
| FEB        |         2 |
| MAR        |         3 |
| APR        |         4 |
| MAY        |         5 |
| JUN        |         6 |
| JUL        |         7 |
| AUG        |         8 |
| SEP        |         9 |
| OCT        |        10 |
| NOV        |        11 |
| DEC        |        12 |
+------------+-----------+

This query generates a twelve-row data set named Months with columns month_name and month_num. I will be using values in a number of examples in the book.

The where Clause

While there are cases when you want to retrieve all rows from a table, it is common to retrieve only a subset of the rows, which is the job of the where clause. For example, you may only want to retrieve rows from the Nation table that have a name starting with the letter U:

PUBLIC>select n_name
       from nation
       where n_name like 'U%';
+----------------+
| N_NAME         |
|----------------|
| UNITED KINGDOM |
| UNITED STATES  |
+----------------+

The job of the where clause is filtering, or the removal of unwanted rows. Each element in the where clause is called a condition, which for the previous example is:

n_name like 'U%'

There can be multiple conditions in a where clause, as shown by the next example, which retrieves the names of nations starting with either U or A:

PUBLIC>select n_name
       from nation
       where n_name like 'U%'
         or n_name like 'A%';
+----------------+
| N_NAME         |
|----------------|
| ALGERIA        |
| ARGENTINA      |
| UNITED KINGDOM |
| UNITED STATES  |
+----------------+

The previous query includes two conditions, separated by or, which specifies that any row satisfying either condition should be included in the result set. Chapter 2 covers filtering in detail.

The group by Clause

As suggested by the name, the group by clause is used for grouping rows. Grouping rows of data is a very common practice, especially for reporting and data analysis. To illustrate, the next example counts the number of countries in the Nation table for each row in the Region table:

PUBLIC>select r_name as region_name,
         count(*) as number_of_countries
       from nation join region
       on nation.n_regionkey = region.r_regionkey
       group by r_name;
+-------------+---------------------+
| REGION_NAME | NUMBER_OF_COUNTRIES |
|-------------+---------------------|
| AFRICA      |                   5 |
| AMERICA     |                   5 |
| MIDDLE EAST |                   5 |
| EUROPE      |                   5 |
| ASIA        |                   5 |
+-------------+---------------------+

This query groups the rows in Nation by their regions, and then uses the built-in count() function to determine the number of nations in each region. Chapter 7 covers grouping.

The having Clause

I introduced the concept of filtering when discussing the role of the where clause (see “The where Clause”), but there are actually three different query clauses that play this role. The second one is the having clause, but its role is to filter rows based on grouped data. Let’s use another table, Supplier, to demonstrate how this works. Here’s the definition of Supplier:

PUBLIC>describe table supplier;
+-------------+--------------+--------+-------+---------+...
| name        | type         | kind   | null? | default |...
|-------------+--------------+--------+-------+---------+...
| S_SUPPKEY   | NUMBER(38,0) | COLUMN | N     | NULL    |...
| S_NAME      | VARCHAR(25)  | COLUMN | N     | NULL    |...
| S_ADDRESS   | VARCHAR(40)  | COLUMN | N     | NULL    |...
| S_NATIONKEY | NUMBER(38,0) | COLUMN | N     | NULL    |...
| S_PHONE     | VARCHAR(15)  | COLUMN | N     | NULL    |...
| S_ACCTBAL   | NUMBER(12,2) | COLUMN | N     | NULL    |...
| S_COMMENT   | VARCHAR(101) | COLUMN | Y     | NULL    |...
+-------------+--------------+--------+-------+---------+...

The Supplier table includes the s_nationkey column, which is a link to the primary key of the Nation table (n_nationkey). The next query counts up the number of suppliers in each nation:

PUBLIC>select n_name as nation_name,
         count(*) as number_of_suppliers
       from supplier join nation
       on supplier.s_nationkey = nation.n_nationkey
       group by n_name;
+----------------+---------------------+
| NATION_NAME    | NUMBER_OF_SUPPLIERS |
|----------------+---------------------|
| PERU           |                 421 |
| ETHIOPIA       |                 380 |
| ARGENTINA      |                 413 |
| MOROCCO        |                 373 |
| IRAQ           |                 438 |
| UNITED KINGDOM |                 390 |
| UNITED STATES  |                 393 |
| CANADA         |                 412 |
| RUSSIA         |                 401 |
| ROMANIA        |                 398 |
| BRAZIL         |                 397 |
| EGYPT          |                 415 |
| INDONESIA      |                 405 |
| ALGERIA        |                 420 |
| VIETNAM        |                 399 |
| JORDAN         |                 362 |
| JAPAN          |                 377 |
| SAUDI ARABIA   |                 411 |
| KENYA          |                 376 |
| CHINA          |                 407 |
| GERMANY        |                 396 |
| FRANCE         |                 402 |
| IRAN           |                 393 |
| INDIA          |                 415 |
| MOZAMBIQUE     |                 406 |
+----------------+---------------------+

There are suppliers in each of the 25 nations, with Jordan having the fewest suppliers (362), and Iraq having the most (438). If you want to retrieve only those nations having more than 400 suppliers, you could add a having clause containing a filter condition on the results of the count() function. Here’s what that would look like:

PUBLIC>select n_name as nation_name,
         count(*) as number_of_suppliers
       from supplier join nation
       on supplier.s_nationkey = nation.n_nationkey
       group by n_name
       having count(*) > 400;
+--------------+---------------------+
| NATION_NAME  | NUMBER_OF_SUPPLIERS |
|--------------+---------------------|
| PERU         |                 421 |
| ARGENTINA    |                 413 |
| IRAQ         |                 438 |
| CANADA       |                 412 |
| RUSSIA       |                 401 |
| EGYPT        |                 415 |
| INDONESIA    |                 405 |
| ALGERIA      |                 420 |
| SAUDI ARABIA |                 411 |
| CHINA        |                 407 |
| FRANCE       |                 402 |
| INDIA        |                 415 |
| MOZAMBIQUE   |                 406 |
+--------------+---------------------+

You can have both where and having clauses in the same query, but any conditions in the where clause are evaluated prior to grouping rows, whereas the conditions in the having clause are evaluated after the rows have been grouped. Here’s an example with multiple filters:

PUBLIC>select n_name as nation_name,
         count(*) as number_of_suppliers
       from supplier join nation
       on supplier.s_nationkey = nation.n_nationkey
       where n_name like '%A'
       group by n_name
       having count(*) > 400;
+--------------+---------------------+
| NATION_NAME  | NUMBER_OF_SUPPLIERS |
|--------------+---------------------|
| ARGENTINA    |                 413 |
| CHINA        |                 407 |
| INDIA        |                 415 |
| CANADA       |                 412 |
| RUSSIA       |                 401 |
| INDONESIA    |                 405 |
| SAUDI ARABIA |                 411 |
| ALGERIA      |                 420 |
+--------------+---------------------+

This query first finds suppliers in nations whose name ends in A, sums the number of suppliers for each nation, and then discards any rows for which the number of suppliers is less than or equal to 400. The having clause is covered along with the group by clause in Chapter 7.

The qualify Clause

So far, you’ve seen two different clauses used for filtering; where and having. The third clause used for filtering is qualify, but it is a special-purpose clause used only for filtering rows based on the results of windowing functions, which are used for multiple purposes including assigning rankings. To illustrate, the next example assigns a rank to every row in the Nation table based on the number of characters in the name, using the built-in length() function:

PUBLIC>select n_name,
         rank() over (order by length(n_name) desc) as length_rank
       from nation;
+----------------+-------------+
| N_NAME         | LENGTH_RANK |
|----------------+-------------|
| UNITED KINGDOM |           1 |
| UNITED STATES  |           2 |
| SAUDI ARABIA   |           3 |
| MOZAMBIQUE     |           4 |
| ARGENTINA      |           5 |
| INDONESIA      |           5 |
| ETHIOPIA       |           7 |
| MOROCCO        |           8 |
| ALGERIA        |           8 |
| VIETNAM        |           8 |
| ROMANIA        |           8 |
| GERMANY        |           8 |
| JORDAN         |          13 |
| FRANCE         |          13 |
| CANADA         |          13 |
| RUSSIA         |          13 |
| BRAZIL         |          13 |
| JAPAN          |          18 |
| KENYA          |          18 |
| EGYPT          |          18 |
| CHINA          |          18 |
| INDIA          |          18 |
| IRAQ           |          23 |
| PERU           |          23 |
| IRAN           |          23 |
+----------------+-------------+

The rank() function is used to generate a ranking for each row, yielding the top ranking to United Kingdom with 14 characters, and the lowest rankings for Iraq, Peru, and Iran, each having only 4 characters. Without going into much detail on the rank() function (which is covered in Chapter 14), let’s modify the previous query to return only those rows with a ranking of 5 or less by adding a qualify clause:

PUBLIC>select n_name,
         rank() over (order by length(n_name) desc) as length_rank
       from nation
       qualify length_rank <= 5;
+----------------+-------------+
| N_NAME         | LENGTH_RANK |
|----------------+-------------|
| UNITED KINGDOM |           1 |
| UNITED STATES  |           2 |
| SAUDI ARABIA   |           3 |
| MOZAMBIQUE     |           4 |
| ARGENTINA      |           5 |
| INDONESIA      |           5 |
+----------------+-------------+

This version of the query returns 6 rows since both Argentina and Indonesia tie for 5th place with 9 characters each. The qualify clause is covered in Chapter 14.

The order by Clause

In general, result sets retuned by queries are not in any particular order. If you want your results to be sorted, such as alphabetically, numerically, or chronologically, you can add an order by clause at the end of your query. The order by clause can include one or more of the elements in your select clause and can reference each either by name or position.

The Supplier table includes the s_acctbal column, which presumably contains an amount owed to each supplier. The next example sorts the results by s_acctbal in descending order:

PUBLIC>select s_name, s_acctbal
       from supplier
       order by s_acctbal desc;
+--------------------+-----------+
| S_NAME             | S_ACCTBAL |
|--------------------+-----------|
| Supplier#000006343 |   9998.20 |
| Supplier#000002522 |   9997.04 |
| Supplier#000000892 |   9993.46 |
| Supplier#000002543 |   9992.70 |
| Supplier#000001833 |   9992.26 |
| Supplier#000009966 |   9991.00 |
| Supplier#000002892 |   9989.02 |
| Supplier#000008875 |   9984.69 |
| Supplier#000002331 |   9984.20 |
| Supplier#000007895 |   9977.32 |
... <7,380 rows omitted>
| Supplier#000003627 |   -986.14 |
| Supplier#000001907 |   -987.45 |
| Supplier#000001654 |   -988.37 |
| Supplier#000001870 |   -989.05 |
| Supplier#000008224 |   -989.86 |
| Supplier#000001764 |   -990.13 |
| Supplier#000005298 |   -990.16 |
| Supplier#000008927 |   -995.53 |
| Supplier#000007259 |   -997.61 |
| Supplier#000009795 |   -998.22 |
+--------------------+-----------+

The Supplier table has 7,400 rows, so I omitted most of them, but you can see how the s_acctbal value decreases over the rows in the result set. The column used for sorting could also be specified by its position in the select clause, which in this case would be 2:

PUBLIC>select s_name, s_acctbal
       from supplier
       order by 2 desc;
+--------------------+-----------+
| S_NAME             | S_ACCTBAL |
|--------------------+-----------|
| Supplier#000006343 |   9998.20 |
| Supplier#000002522 |   9997.04 |
| Supplier#000000892 |   9993.46 |
| Supplier#000002543 |   9992.70 |
| Supplier#000001833 |   9992.26 |
| Supplier#000009966 |   9991.00 |
| Supplier#000002892 |   9989.02 |
| Supplier#000008875 |   9984.69 |
| Supplier#000002331 |   9984.20 |
| Supplier#000007895 |   9977.32 |
... <7,390 rows omitted>
+--------------------+-----------+

You can specify the sort order as either descending (desc) or ascending (asc). While there is no chapter specifically for the order by clause, you will see it used in various examples in this book.

The limit Clause

The result set from the previous example contained 7,400 rows; wouldn’t it be great if you could specify that only a subset be returned? This is where the limit clause comes in, and it allows you to specify how many rows to return, starting either at the first row or at a specified offset. Here’s the query from the previous section, but with a limit clause to specify that only the first 10 rows be returned:

PUBLIC>select s_name, s_acctbal
         from supplier
         order by s_acctbal desc
         limit 10;
+--------------------+-----------+
| S_NAME             | S_ACCTBAL |
|--------------------+-----------|
| Supplier#000006343 |   9998.20 |
| Supplier#000002522 |   9997.04 |
| Supplier#000000892 |   9993.46 |
| Supplier#000002543 |   9992.70 |
| Supplier#000001833 |   9992.26 |
| Supplier#000009966 |   9991.00 |
| Supplier#000002892 |   9989.02 |
| Supplier#000008875 |   9984.69 |
| Supplier#000002331 |   9984.20 |
| Supplier#000007895 |   9977.32 |
+--------------------+-----------+

In this example, the server sorts all 7,400 rows as specified by the order by clause, and then returns only the first 10. You can also use the optional offset subclause to tell the server to start at a particular row. Since I know that there are 7,400 rows in the Supplier table, I can specify an offset of 7,390 to see the last 10 rows in the result set:

PUBLIC>select s_name, s_acctbal
       from supplier
       order by s_acctbal desc
       limit 10 offset 7390;
+--------------------+-----------+
| S_NAME             | S_ACCTBAL |
|--------------------+-----------|
| Supplier#000003627 |   -986.14 |
| Supplier#000001907 |   -987.45 |
| Supplier#000001654 |   -988.37 |
| Supplier#000001870 |   -989.05 |
| Supplier#000008224 |   -989.86 |
| Supplier#000001764 |   -990.13 |
| Supplier#000005298 |   -990.16 |
| Supplier#000008927 |   -995.53 |
| Supplier#000007259 |   -997.61 |
| Supplier#000009795 |   -998.22 |
+--------------------+-----------+

While it would be great if you could specify limit -10 to see the last 10 rows in the result set without knowing the total number of rows, that functionality isn’t available at this time. However, you can always sort in the opposite direction and choose the first 10 rows:

PUBLIC>select s_name, s_acctbal
              from supplier
              order by s_acctbal asc
              limit 10;
+--------------------+-----------+
| S_NAME             | S_ACCTBAL |
|--------------------+-----------|
| Supplier#000009795 |   -998.22 |
| Supplier#000007259 |   -997.61 |
| Supplier#000008927 |   -995.53 |
| Supplier#000005298 |   -990.16 |
| Supplier#000001764 |   -990.13 |
| Supplier#000008224 |   -989.86 |
| Supplier#000001870 |   -989.05 |
| Supplier#000001654 |   -988.37 |
| Supplier#000001907 |   -987.45 |
| Supplier#000003627 |   -986.14 |
+--------------------+-----------+

Limit Versus Top

Along with the limit clause, Snowflake provides the top keyword, which you can specify in the select clause to limit the number of rows a query returns. To see the top 10 suppliers ranked by account balance, you could use the following query:

PUBLIC>select top 10 s_name, s_acctbal
       from supplier
       order by s_acctbal desc;
+--------------------+-----------+
| S_NAME             | S_ACCTBAL |
|--------------------+-----------|
| Supplier#000006343 |   9998.20 |
| Supplier#000002522 |   9997.04 |
| Supplier#000000892 |   9993.46 |
| Supplier#000002543 |   9992.70 |
| Supplier#000001833 |   9992.26 |
| Supplier#000009966 |   9991.00 |
| Supplier#000002892 |   9989.02 |
| Supplier#000008875 |   9984.69 |
| Supplier#000002331 |   9984.20 |
| Supplier#000007895 |   9977.32 |
+--------------------+-----------+

Using top 10 is equivalent to adding a limit 10 clause after order by, but the top functionality doesn’t allow for an offset, so it is not as flexible.

Wrap-Up

The select statement certainly seems simple at first, but as you combine more and more of the eight clauses demonstrated in this chapter, queries can become quite sophisticated. Things become even more interesting when you start including subqueries (Chapter 8), conditional logic (Chapter 10), and window functions (Chapter 14). Chapters 5 and 6 introduce some of the other SQL data statements, such as insert and update, some of which will utilize some of the same clauses used to construct select statements.

Test Your Knowledge

The following exercises are designed to strengthen your understanding of the select statement. Please see “Chapter 1” in Appendix B for solutions.

Exercise 1-1

Write a query to retrieve the n_nationkey and n_name columns from the Nation table. Sort the rows by n_name values.

Exercise 1-2

Write a query to retrieve the n_nationkey and n_name columns from the Nation table, but only for those rows with a value of 3 for n_regionkey.

Exercise 1-3

Write a query to retrieve the n_nationkey and n_name columns from the Nation table and join to the Region table (using the r_regionkey column) to retrieve only those nations belonging to the Africa region (r_name = 'AFRICA').

Exercise 1-4

Retrieve the s_name and s_acctbal columns from the Supplier table. Sort by s_acctbal in descending order and retrieve only the first 10 rows (which will be the 10 suppliers with the highest s_acctbal values).

Get Learning Snowflake SQL and Scripting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Learning Snowflake SQL and Scripting by Alan Beaulieu

Chapter 1. Query Primer

Query Basics

Figure 1-1. Setting the schema using Snowflake’s web interface (Snowsight)

Snowflake Web Interface Versus SnowSQL CLI

Figure 1-2. Querying `Region` table using Snowsight

Command Parameters

Query Clauses

The select Clause

Column aliases

Removing duplicates

The from Clause

The where Clause

The group by Clause

The having Clause

The qualify Clause

The order by Clause

The limit Clause

Wrap-Up

Test Your Knowledge

Exercise 1-1

Exercise 1-2

Exercise 1-3

Exercise 1-4

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

Chapter 1. Query Primer

Query Basics

Figure 1-1. Setting the schema using Snowflake’s web interface (Snowsight)

Snowflake Web Interface Versus SnowSQL CLI

Figure 1-2. Querying Region table using Snowsight

Command Parameters

Query Clauses

The select Clause

Column aliases

Removing duplicates

The from Clause

The where Clause

The group by Clause

The having Clause

The qualify Clause

The order by Clause

The limit Clause

Wrap-Up

Test Your Knowledge

Exercise 1-1

Exercise 1-2

Exercise 1-3

Exercise 1-4

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

Figure 1-2. Querying `Region` table using Snowsight