Chapter 4. Group Operations

Group operations are quite common in the day-to-day life of a SQL programmer. When you use SQL to access a database, it is quite common to expect questions such as:

  • What is the maximum salary in this department?

  • How many managers are there in each department?

  • What is the number of customers for each product?

  • Can you print the monthly aggregate sales for each region?

You need group operations to answer these questions. Oracle provides a rich set of features to handle group operations. These features include aggregate functions, the GROUP BY clause, the HAVING clause, and the extensions to the GROUP BY clause—ROLLUP, CUBE, and GROUPING SETS.

Tip

This chapter deals with simple group operations involving the aggregate functions, the GROUP BY and HAVING clauses. Advanced group operations such as ROLLUP, CUBE, and GROUPING SETS are discussed in Chapter 13.

Aggregate Functions

An aggregate function summarizes the results of an expression over a number of rows, returning a single value. The general syntax for most of the aggregate functions is as follows:

            aggregate_function([DISTINCT | ALL] expression)

The syntax elements are:

aggregate_function

Gives the name of the function—e.g., SUM, COUNT, AVG, MAX, MIN

DISTINCT

Specifies that the aggregate function should consider only distinct values of the argument expression.

ALL

Specifies that the aggregate function should consider all values, including all duplicate values, of the argument expression. The default is ALL.

expression

Specifies a column, or any other expression, on which you want to perform the aggregation.

Let’s look at a simple example. The following SQL uses the MAX function to find the maximum salary of all employees:

            SELECT MAX(salary) FROM employee;

MAX(SALARY)
-----------
       5000

In subsequent sections, we use a series of slightly more involved examples that illustrate various aspects of aggregate function behavior. For those examples, we use the following cust_order table:

            DESC cust_order

 Name                             Null?    Type
 -------------------------------- -------- --------------
 ORDER_NBR                        NOT NULL NUMBER(7)
 CUST_NBR                         NOT NULL NUMBER(5)
 SALES_EMP_ID                     NOT NULL NUMBER(5)
 SALE_PRICE                                NUMBER(9,2)
 ORDER_DT                         NOT NULL DATE
 EXPECTED_SHIP_DT                 NOT NULL DATE
 CANCELLED_DT                              DATE
 SHIP_DT                                   DATE
 STATUS                                    VARCHAR2(20)

SELECT order_nbr, cust_nbr, sales_emp_id, sale_price,
            order_dt, expected_ship_dt
            FROM cust_order;

ORDER_NBR CUST_NBR SALES_EMP_ID SALE_PRICE ORDER_DT  EXPECTED_ 
--------- -------- ------------ ---------- --------- --------- 
     1001        1         7354         99 22-JUL-01 23-JUL-01
     1000        1         7354            19-JUL-01 24-JUL-01
     1002        5         7368            12-JUL-01 25-JUL-01
     1003        4         7654         56 16-JUL-01 26-JUL-01
     1004        4         7654         34 18-JUL-01 27-JUL-01
     1005        8         7654         99 22-JUL-01 24-JUL-01
     1006        1         7354            22-JUL-01 28-JUL-01
     1007        5         7368         25 20-JUL-01 22-JUL-01
     1008        5         7368         25 21-JUL-01 23-JUL-01
     1009        1         7354         56 18-JUL-01 22-JUL-01
     1012        1         7354         99 22-JUL-01 23-JUL-01
     1011        1         7354            19-JUL-01 24-JUL-01
     1015        5         7368            12-JUL-01 25-JUL-01
     1017        4         7654         56 16-JUL-01 26-JUL-01
     1019        4         7654         34 18-JUL-01 27-JUL-01
     1021        8         7654         99 22-JUL-01 24-JUL-01
     1023        1         7354            22-JUL-01 28-JUL-01
     1025        5         7368         25 20-JUL-01 22-JUL-01
     1027        5         7368         25 21-JUL-01 23-JUL-01
     1029        1         7354         56 18-JUL-01 22-JUL-01

20 rows selected.

NULLs and Aggregate Functions

Notice that the column sale_price in the cust_order table is nullable, and that it contains NULL values for some rows. To examine the effect of NULLs in an aggregate function, execute the following SQL:

               SELECT COUNT(*), COUNT(sale_price) FROM cust_order;

COUNT(*) COUNT(SALE_PRICE)
-------- -----------------
      20                14

Notice the difference in the output of COUNT(*) and COUNT(sale_price). This is because COUNT(sale_price) ignores NULLs, whereas COUNT(*) doesn’t. The reason COUNT(*) doesn’t ignore NULLs is because it counts rows, not column values. The concept of NULL doesn’t apply to a row as a whole. Other than COUNT(*), there is only one other aggregate function that doesn’t ignore NULLs, and that is GROUPING. All other aggregate functions ignore NULLs. We will discuss GROUPING in Chapter 13. For now, let’s examine the effect of NULLs when they are ignored.

SUM, MAX, MIN, AVG, etc., all ignore NULLs. Therefore, if you are trying to find a value such as the average sale price in the cust_order table, the average will be of the 14 rows that have a value for that column. The following example shows the count of all rows, the total of all sale prices, and the average of all sale prices:

               SELECT COUNT(*), SUM(sale_price), AVG(sale_price)
               FROM cust_order;

       COUNT(*) SUM(SALE_PRICE) AVG(SALE_PRICE)
--------------- --------------- ---------------
             20             788      56.2857143

Note that AVG(sale_price) is not equal to SUM(sale_price) / COUNT(*). If it were, the result of AVG(sale_price) would have been 788 / 20 = 39.4. But, since the AVG function ignores NULLS, it divides the total sale price by 14, and not by 20. AVG(sale_price) is equal to SUM(sale_price) / COUNT(sale_price) (788 / 14 = 56.2857143).

There may be situations where you want an average to be taken over all the rows in a table, not just the rows with non-NULL values for the column in question. In those situations you have to use the NVL function within the AVG function call to assign 0 (or some other useful value) to the column in place of any NULL values. (DECODE, CASE, or the COALESCE function can be used in place of NVL. See Chapter 9 for details.) Here’s an example:

               SELECT AVG(NVL(sale_price,0)) FROM cust_order;

AVG(NVL(SALE_PRICE,0))
----------------------
                  39.4

Notice that the use of NVL causes all 20 rows to be considered for average computation, and the rows with NULL values for sale_price are assumed to have a 0 value for that column.

Use of DISTINCT and ALL

Most aggregate functions allow the use of DISTINCT or ALL along with the expression argument. DISTINCT allows you to disregard duplicate expression values, while ALL causes duplicate expression values to be included in the result. Notice that the column cust_nbr has duplicate values. Observe the result of the following SQL:

               SELECT COUNT(cust_nbr), COUNT(DISTINCT cust_nbr), COUNT(ALL cust_nbr)
               FROM cust_order;

COUNT(CUST_NBR) COUNT(DISTINCTCUST_NBR) COUNT(ALLCUST_NBR)
--------------- ----------------------- ------------------
             20                       4                 20

There are four distinct values in the cust_nbr column. Therefore, COUNT(DISTINCT cust_nbr) returns 4, whereas COUNT(cust_nbr) and COUNT(ALL cust_nbr) both return 20. ALL is the default, which means that if you don’t specify either DISTINCT or ALL before the expression argument in an aggregate function, the function will consider all the rows that have a non-NULL value for the expression.

An important thing to note here is that ALL doesn’t cause an aggregate function to consider NULL values. For example, COUNT(ALL SALE_PRICE) in the following example still returns 14, and not 20:

               SELECT COUNT(ALL sale_price) FROM cust_order;

COUNT(ALLSALE_PRICE)
--------------------
                  14

Since ALL is the default, you can explicitly use ALL with every aggregate function. However, the aggregate functions that take more than one argument as input don’t allow the use of DISTINCT. These include CORR, COVAR_POP, COVAR_SAMP, and all the linear regression functions.

In addition, some functions that take only one argument as input don’t allow the use of DISTINCT. This category includes STTDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, and GROUPING.

If you try to use DISTINCT with an aggregate function that doesn’t allow it, you will get an error. For example:

               SELECT STDDEV_POP(DISTINCT sale_price)
               FROM cust_order;

SELECT STDDEV_POP(DISTINCT sale_price)
       *
ERROR at line 1:
ORA-30482: DISTINCT option not allowed for this function

However, using ALL with such a function doesn’t cause any error. For example:

               SELECT STDDEV_POP(ALL sale_price)
               FROM cust_order;

STDDEV_POP(ALLSALE_PRICE)
-------------------------
               29.5282639

The GROUP BY Clause

The GROUP BY clause, along with the aggregate functions, groups a result set into multiple groups, and then produces a single row of summary information for each group. For example, if you want to find the total number of orders for each customer, execute the following query:

            SELECT cust_nbr, COUNT(order_nbr) 
            FROM cust_order 
            GROUP BY cust_nbr;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
        1                8
        4                4
        5                6
        8                2

This query produces one summary line of output for each customer. This is the essence of a GROUP BY query. You asked Oracle to GROUP the results BY cust_nbr; therefore, it produced one output row for each distinct value of cust_nbr. Each data value for a given customer represents a summary based on all rows for that customer.

Correspondence Between SELECT and GROUP BY

When you write a query with a GROUP BY clause, there are a number of rules you need to be aware of that govern the correspondence between the columns in the SELECT and GROUP BY clauses. Generally speaking, any nonaggregate expression in your SELECT clause must also be reflected in your GROUP BY clause.

Aggregate expressions generally require a GROUP BY clause

The nonaggregate expression cust_nbr in the SELECT list of the query in our most recent example also appears in the GROUP BY clause. If you have a mix of aggregate and nonaggregate expressions in the SELECT list, SQL expects that you are trying to perform a GROUP BY operation, and you must also include a GROUP BY clause in your query. Oracle returns an error if you fail to do so. For example, if you omit the GROUP BY clause, the following error is returned:

                  SELECT cust_nbr, sales_emp_id, COUNT(order_nbr)
                  FROM cust_order;

SELECT cust_nbr, sales_emp_id, COUNT(order_nbr)
       *
ERROR at line 1:
ORA-00937: not a single-group group function

There is one case in which you can write aggregate expressions in a SELECT list without also writing a GROUP BY clause and that is when you wish those aggregate expressions to apply to the entire result set. In such a case, your SELECT list must consist only of aggregate expressions. The queries earlier in this chapter, introducing the aggregate functions, are good examples of this case.

GROUP BY clause must include all nonaggregate expressions

If you forget to include all nonaggregate expressions from the SELECT list in the GROUP BY clause, SQL returns the following error:

                  SELECT cust_nbr, sales_emp_id, COUNT(order_nbr)
                  FROM cust_order
                  GROUP BY cust_nbr;

                  SELECT cust_nbr, sales_emp_id, COUNT(order_nbr)
                 *
ERROR at line 1:
ORA-00979: not a GROUP BY expression

Aggregate functions not allowed in GROUP BY clause

You can’t use a group function (aggregate function) in the GROUP BY clause. You will get an error if you attempt to do so, as in the following example:

                  SELECT cust_nbr, COUNT(order_nbr)
                  FROM cust_order
                  GROUP BY cust_nbr, COUNT(order_nbr);

                  GROUP BY cust_nbr, COUNT(order_nbr)
                   *
ERROR at line 3:
ORA-00934: group function is not allowed here

Constants can be omitted from the GROUP BY clause

If you have a constant in your SELECT list, you don’t need to include it in the GROUP BY clause. However, including the constant in the GROUP BY clause doesn’t alter the result. Therefore, both the following statements will produce the same output:

                  SELECT 'CUSTOMER', cust_nbr, COUNT(order_nbr) 
                  FROM cust_order 
                  GROUP BY cust_nbr;

                  SELECT 'CUSTOMER', cust_nbr, COUNT(order_nbr) 
                  FROM cust_order
                  GROUP BY 'CUSTOMER', cust_nbr;

'CUSTOMER'   CUST_NBR COUNT(ORDER_NBR)
---------- ---------- ----------------
CUSTOMER            1                8
CUSTOMER            4                4
CUSTOMER            5                6
CUSTOMER            8                2

Scalar functions may be grouped by their underlying column

If a scalar function has been applied to a column in the SELECT list, the syntax doesn’t force you to include the scalar function in the GROUP BY clause. For example:

                  SELECT SUBSTR(lname,1,1), COUNT(*)
                  FROM employee
                  GROUP BY lname;

S   COUNT(*)
- ----------
A          1
A          1
B          1
C          1
F          1
J          1
J          1
K          1
M          1
M          1
S          1
S          1
T          1
W          1

14 rows selected.

In this example, the SELECT list has SUBSTR(lname,1,1); however, the GROUP BY clause contains just lname, without the SUBSTR function on it. Though this query is syntactically correct, if you look at the result set, you will notice that there are multiple rows with the same value for SUBSTR(lname,1,1). This means that the GROUP BY operation takes place for the entire lname, but only the substring is displayed. If you really want the result set to be grouped by the substring expression, you should include that expression in the GROUP BY clause, as shown in the following example:

                  SELECT SUBSTR(lname,1,1), COUNT(*)
                  FROM employee
                  GROUP BY SUBSTR(lname,1,1);

S   COUNT(*)
- ----------
A          2
B          1
C          1
F          1
J          2
K          1
M          2
S          2
T          1
W          1

10 rows selected.

Notice the difference. This time, there is only one row in the result set for each value returned by SUBSTR(lname,1,1). The rows have been grouped on the exact same expression as is displayed.

Concatenated columns may be grouped in either of two ways

If an expression in a SELECT list concatenates two columns, you can specify the GROUP BY clause in one of the following two ways—both giving the same result:

                  SELECT manager_emp_id || job_id, COUNT(*)
                  FROM employee
                  GROUP BY manager_emp_id || job_id;

                  SELECT manager_emp_id || job_id, COUNT(*)
                  FROM employee
                  GROUP BY manager_emp_id, job_id;

MANAGER_EMP_ID||JOB_ID      COUNT(*)
------------------------- ----------
672                                1
7566669                            2
7698                               1
7698667                            1
7698670                            3
7782667                            1
7788                               1
7839671                            3
7902667                            1

You can sometimes exclude a nonaggregate expression from the GROUP BY clause

There are certain situations in which you want an expression in the SELECT list, but don’t want to group by the same. For example, you might want to display a line number along with summary information for each customer. Attempt to do so using the following query, and you will get an error:

                  SELECT ROWNUM, cust_nbr, COUNT(order_nbr)
                  FROM cust_order
                  GROUP BY cust_nbr;

SELECT ROWNUM, cust_nbr, COUNT(order_nbr)
       *
ERROR at line 1:
ORA-00979: not a GROUP BY expression

If you include ROWNUM in the GROUP BY clause, you’ll get the following, unexpected result:

                  SELECT ROWNUM, cust_nbr, COUNT(order_nbr) 
                  FROM cust_order
                  GROUP BY ROWNUM, cust_nbr;

    ROWNUM   CUST_NBR COUNT(ORDER_NBR)
---------- ---------- ----------------
         1          1                1
         2          1                1
         3          5                1
         4          4                1
         5          4                1
         6          8                1
         7          1                1
         8          5                1
         9          5                1
        10          1                1
        11          1                1
        12          1                1
        13          5                1
        14          4                1
        15          4                1
        16          8                1
        17          1                1
        18          5                1
        19          5                1
        20          1                1

20 rows selected.

You certainly didn’t want this result, did you? You wanted to receive one summary row for each customer, and then to display ROWNUM for those lines. But when you include ROWNUM in the GROUP BY clause, it produces one summary row for each row selected from the table cust_order. To get the expected result, you should use the following SQL:

                  SELECT ROWNUM, v.*
                  FROM (SELECT cust_nbr, COUNT(order_nbr) 
                        FROM cust_order GROUP BY cust_nbr) v;

    ROWNUM   CUST_NBR COUNT(ORDER_NBR)
---------- ---------- ----------------
         1          1                8
         2          4                4
         3          5                6
         4          8                2

The construct in the FROM clause is called an inline view. Read more about inline views in Chapter 5.

You are not required to show your GROUP BY columns

Syntactically, it is not mandatory to include all the expressions of the GROUP BY clause in the SELECT list. However, those expressions not in the SELECT list will not be represented in the output; therefore, the output may not make much sense. For example:

                  SELECT COUNT(order_nbr) 
                  FROM cust_order 
                  GROUP BY cust_nbr;

COUNT(ORDER_NBR)
----------------
               8
               4
               6
               2

This query produces a count of orders for each customer (by grouping based on cust_nbr), but without the cust_nbr in the output you can’t associate the counts with the customers. Extending the previous example, you can see that without a consistent SELECT list and GROUP BY clause, the output may be a bit confusing. The following example produces output that at first glance seems useful:

                  SELECT cust_nbr, COUNT(order_nbr) 
                  FROM cust_order 
                  GROUP BY cust_nbr, order_dt;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
         1                2
         1                2
         1                4
         4                2
         4                2
         5                2
         5                2
         5                2
         8                2

9 rows selected.

From the output, it appears that you are trying to obtain a count of orders for each customer. However, there are multiple rows in the output for some cust_nbr values. The fact that you have included order_dt in the GROUP BY clause, and therefore generated a summary result for each combination of cust_nbr and order_dt, is missing from the output. You can’t make sense of the output unless the output and the SQL statement are looked at together. You can’t expect all readers of SQL output to understand SQL syntax, can you? Therefore, we always recommend maintaining consistency between the nonaggregate expressions in the SELECT list and the expressions in the GROUP BY clause. A more meaningful version of the previous SQL statement would be as follows:

                  SELECT cust_nbr, order_dt, COUNT(order_nbr) 
                  FROM cust_order
                  GROUP BY cust_nbr, order_dt;

  CUST_NBR ORDER_DT  COUNT(ORDER_NBR)
---------- --------- ----------------
         1 18-JUL-01                2
         1 19-JUL-01                2
         1 22-JUL-01                4
         4 16-JUL-01                2
         4 18-JUL-01                2
         5 12-JUL-01                2
         5 20-JUL-01                2
         5 21-JUL-01                2
         8 22-JUL-01                2

9 rows selected.

This output is consistent with the GROUP BY clause in the query. Readers of the report are more likely to make the correct assumption about what this output represents.

GROUP BY Clause and NULL Values

When you GROUP BY a column that contains NULL values for some rows, all the rows with NULL values are placed into a single group and presented as one summary row in the output. For example:

               SELECT sale_price, COUNT(order_nbr) 
               FROM cust_order 
               GROUP BY sale_price;

SALE_PRICE COUNT(ORDER_NBR)
---------- ----------------
        25                4
        34                2
        56                4
        99                4
                          6

Notice that the last row in the output consists of a NULL value for the column sale_price. If you want the row containing the NULL value to be the first row in the output, you can perform an ORDER BY on sale_price in descending order:

               SELECT sale_price, COUNT(order_nbr) 
               FROM cust_order 
               GROUP BY sale_price
               ORDER BY sale_price DESC;

SALE_PRICE COUNT(ORDER_NBR)
---------- ----------------
                          6
        99                4
        56                4
        34                2
        25                4

Whether you are using a GROUP BY or not, the ORDER BY clause can have an optional NULLS FIRST or NULLS LAST option to put the NULLs either at the beginning or at the end of the result set, respectively. For example, to sort NULLs first:

               SELECT sale_price, COUNT(order_nbr) 
               FROM cust_order 
               GROUP BY sale_price
               ORDER BY sale_price NULLS FIRST;

SALE_PRICE COUNT(ORDER_NBR)
---------- ----------------
                          6
        25                4
        34                2
        56                4
        99                4

Or, to sort NULLs last:

               SELECT sale_price, COUNT(order_nbr) 
               FROM cust_order 
               GROUP BY sale_price
               ORDER BY sale_price NULLS LAST;

SALE_PRICE COUNT(ORDER_NBR)
---------- ----------------
        25                4
        34                2
        56                4
        99                4
                          6

GROUP BY Clause with WHERE Clause

While producing summary results using the GROUP BY clause, you can filter records from the table based on a WHERE clause, as in the following example, which produces a count of orders in which the sale price exceeds $25 for each customer:

               SELECT cust_nbr, COUNT(order_nbr)
               FROM cust_order
               WHERE sale_price > 25
               GROUP BY cust_nbr;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
         1                4
         4                4
         8                2

While executing a SQL statement with a WHERE clause and a GROUP BY clause, Oracle first applies the WHERE clause and filters out the rows that don’t satisfy the WHERE condition. The rows that satisfy the WHERE clause are then grouped using the GROUP BY clause.

SQL syntax requires that the WHERE clause must come before the GROUP BY clause. Otherwise, the following error is returned:

               SELECT cust_nbr, COUNT(order_nbr)
               FROM cust_order
               GROUP BY cust_nbr
               WHERE sale_price > 25;

WHERE sale_price > 25
*
ERROR at line 4:
ORA-00933: SQL command not properly ended

The HAVING Clause

The HAVING clause is closely associated with the GROUP BY clause. The HAVING clause is used to put a filter on the groups created by the GROUP BY clause. If a query has a HAVING clause along with a GROUP BY clause, the result set will include only the groups that satisfy the condition specified in the HAVING clause. Let’s look at some examples that illustrate this. The following query returns the number of orders per customer:

            SELECT cust_nbr, COUNT(order_nbr) 
            FROM cust_order 
            GROUP BY cust_nbr
            HAVING cust_nbr < 6;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
         1                8
         4                4
         5                6

Notice that the output only includes customers with numbers below 6. That’s because the HAVING clause specified cust_nbr < 6 as a condition. Orders for all customers were counted, but only those groups that matched the specified HAVING condition were returned as the result.

The previous example is a poor use of the HAVING clause, because that clause references only unsummarized data. It’s more efficient to use WHERE cust_nbr < 6 instead of HAVING cust_nbr < 6, because the WHERE clause eliminates rows prior to summarization, whereas HAVING eliminates groups post-summarization. A better version of the previous query would be:

SELECT cust_nbr, COUNT(order_nbr) 
FROM cust_order 
WHERE cust_nbr < 6
GROUP BY cust_nbr;

The next example shows a more appropriate use of the HAVING clause:

            SELECT cust_nbr, COUNT(order_nbr) 
            FROM cust_order 
            GROUP BY cust_nbr
            HAVING COUNT(order_nbr) > 2;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
         1                8
         4                4
         5                6

See the use of the aggregate function COUNT in the HAVING clause? This is an appropriate use for HAVING, because the results of the aggregate function cannot be determined until after the grouping takes place.

The syntax for the HAVING clause is similar to that of the WHERE clause. However, there is one restriction on the conditions you can write in the HAVING clause. A HAVING condition can refer only to an expression in the SELECT list, or to an expression involving an aggregate function. If you specify an expression in the HAVING clause that isn’t in the SELECT list, or that isn’t an aggregate expression, you will get an error. For example:

            SELECT cust_nbr, COUNT(order_nbr) 
            FROM cust_order 
            GROUP BY cust_nbr
            HAVING order_dt < SYSDATE;

HAVING order_dt < SYSDATE
       *
ERROR at line 4:
ORA-00979: not a GROUP BY expression

However, you can use an aggregate expression in the HAVING clause, even if it doesn’t appear in the SELECT list, as illustrated in the following example:

            SELECT cust_nbr
            FROM cust_order
            GROUP BY cust_nbr
            HAVING COUNT(order_nbr) < 5;

  CUST_NBR
----------
         4
         8

In the preceding example, the HAVING clause refers to COUNT(order_nbr), which is not in the SELECT list. You are not required to show in your result set all the columns or expressions that determine which rows end up in that result set.

The order of the GROUP BY clause and the HAVING clause in a SELECT statement is not important. You can specify the GROUP BY clause before the HAVING clause, or vice versa. Therefore, the following two queries are the same and produce the same result:

            SELECT cust_nbr, COUNT(order_nbr) 
            FROM cust_order
            GROUP BY cust_nbr
            HAVING COUNT(order_nbr) > 2;

            SELECT cust_nbr, COUNT(order_nbr) 
            FROM cust_order
            HAVING COUNT(order_nbr) > 2
            GROUP BY cust_nbr;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
         1                8
         4                4
         5                6

Even though Oracle doesn’t care whether the HAVING clause comes before the GROUP BY clause or after, the HAVING clause is applied to the groups created by the GROUP BY clause, so it is a good programming practice to always put the HAVING clause after the GROUP BY clause. Another reason for placing HAVING after GROUP BY is that SQL Standard requires that particular order. Thus, putting HAVING after GROUP BY makes your code more portable.

You can use a WHERE clause and a HAVING clause together in a query. When you do, it is important to understand the impact of the two clauses. The WHERE clause is executed first, and the rows that don’t satisfy the WHERE condition are not passed to the GROUP BY clause. The GROUP BY clause summarizes the filtered data into groups, and then the HAVING clause is applied to the groups to eliminate the groups that don’t satisfy the HAVING condition. The following example illustrates this:

            SELECT cust_nbr, COUNT(order_nbr) 
            FROM cust_order
            WHERE sale_price > 25
            GROUP BY cust_nbr
            HAVING COUNT(order_nbr) > 1;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
         1                4
         4                4
         8                2

In this example, the WHERE clause first eliminates all the orders that don’t satisfy the condition sale_price > 25. The rest of the rows are grouped on cust_nbr. The HAVING clause eliminates the customers that don’t have more than one order.

Nested Group Operations

The examples discussed in this chapter so far all involved one group operation on a column of a table. SQL also allows you to nest group functions, which means that one group function can enclose an expression that is itself a group operation on another expression or column. Let’s consider the following example:

An economic slowdown has resulted in budget constraints for many employers, especially in the IT industry. Budget constraints have forced companies to take a second look at the money spent on employee compensation. Some companies have had to downsize their workforce, others have had to cut down employee bonuses, while still others have cut the employee base salaries. Your company is no exception, and is also under financial pressure. Your CEO must take a serious look at the compensation structure of the employees at various levels in various departments in the company.

Your CEO calls on you to query the Human Resources database and help him collect data. By this time, you’ve already heard the rumors of upcoming “reductions in force” (RIFs) and compensation cuts. This is your golden opportunity to impress your CEO with your skills, to make sure you are not affected by the RIFs. Mess up now, and you can be pretty sure that you have to start looking for another job in this increasingly competitive job market.

Here’s your CEO’s first question: What is the maximum amount of money spent by any department on employee salaries?

To answer this question, you know that you need to compute the sum of the salaries of all the employees in each department, and then find the maximum of those individual sums. Now that you know about the GROUP BY clause, finding the sum of salaries for all the employees in each department is easy:

            SELECT dept_id, SUM(salary) 
            FROM employee 
            GROUP BY dept_id;

   DEPT_ID SUM(SALARY)
---------- -----------
        10        8750
        20        9900
        30        9400

However, your task is half done. You next need to find the maximum of the SUM(salary) values returned by this query. One way to do that is to use the preceding query as an inline view. (Inline views are discussed in detail in Chapter 5.) The following SELECT takes the results from the earlier query, which is now a nested query, and applies the MAX function to retrieve the highest SUM(salary) value:

            SELECT MAX(sal) FROM
            (SELECT dept_id, SUM(salary) sal 
            FROM employee 
            GROUP BY dept_id);

  MAX(SAL)
----------
      9900

However, you don’t even need to write a subquery. Another, simpler way of writing the query you need is:

            SELECT MAX(SUM(salary)) 
            FROM employee 
            GROUP BY dept_id;

MAX(SUM(SALARY))
----------------
            9900

The MAX(SUM(salary)) in this query is a nested group operation. When the query executes, the rows are aggregated by department. The innermost group function, in this case the SUM function, is used to generate one salary value per department. This is no different than in previous examples, but this time you have a MAX function seemingly left over. That MAX function is applied to the entire collection of SUM(salary) values produced by the initial aggregation. The result is a single value, the maximum amount of money that any one department spends on employee salaries, which is just what your CEO wanted.

Tip

It never makes sense to use more than one level of nested group function. MAX(SUM(salary)) returns a single value. One group function enclosing another will always return a single value. No further aggregation is possible.

Knowing only the maximum of the total salaries paid by a department isn’t going to help much. So, your CEO’s next question is to ask about the minimum and average amounts of money spent by any department on employee salaries. That should be an easy one to answer now. Just apply the same pattern as used in the previous query:

            SELECT MIN(SUM(salary)), AVG(SUM(salary)) 
            FROM employee 
            GROUP BY dept_id;

MIN(SUM(SALARY)) AVG(SUM(SALARY))
---------------- ----------------
            8750             9350

Observing that the maximum (9900) is not too large compared to the minimum (8750) and the average (9350), your CEO realizes that all the departments spend pretty much uniformly on employee salaries. He next asks: What is the maximum, minimum, and average number of employees in any department? Use the following query to answer that question:

            SELECT MAX(COUNT(*)), MIN(COUNT(*)), AVG(COUNT(*)) 
            FROM employee 
            GROUP BY dept_id;

MAX(COUNT(*)) MIN(COUNT(*)) AVG(COUNT(*))
------------- ------------- -------------
            6             3    4.66666667

The information that some departments have double the number of employees than some others may give your CEO some ideas about how he wants to reorganize the company and reduce cost. Hopefully, you have impressed your CEO with your SQL skills to the point that you are sleeping better at nights now, secure that you, at least, will still be employed in the morning.

Get Mastering Oracle SQL, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.