Chapter 4. Group Operations
Group operations are quite common in the day-to-day life of a SQL programmer. When you use SQL to access a database, it is quite common to expect questions such as:
What is the maximum salary in this department?
How many managers are there in each department?
What is the number of customers for each product?
Can you print the monthly aggregate sales for each region?
You need group operations to answer these questions. Oracle provides a rich set of features to handle group operations. These features include aggregate functions, the GROUP BY clause, the HAVING clause, and the extensions to the GROUP BY clause—ROLLUP, CUBE, and GROUPING SETS.
Tip
This chapter deals with simple group operations involving the aggregate functions, the GROUP BY and HAVING clauses. Advanced group operations such as ROLLUP, CUBE, and GROUPING SETS are discussed in Chapter 13.
Aggregate Functions
An aggregate function summarizes the results of an expression over a number of rows, returning a single value. The general syntax for most of the aggregate functions is as follows:
aggregate_function
([DISTINCT | ALL]expression
)
The syntax elements are:
-
aggregate_function
Gives the name of the function—e.g., SUM, COUNT, AVG, MAX, MIN
- DISTINCT
Specifies that the aggregate function should consider only distinct values of the argument expression.
- ALL
Specifies that the aggregate function should consider all values, including all duplicate values, of the argument expression. The default is ALL.
-
expression
Specifies a column, or any other expression, on which you want to perform the aggregation.
Let’s look at a simple example. The following SQL uses the MAX function to find the maximum salary of all employees:
SELECT MAX(salary) FROM employee;
MAX(SALARY)
-----------
5000
In subsequent sections, we use a series of slightly more involved
examples that illustrate various aspects of aggregate function
behavior. For those examples, we use the following
cust_order
table:
DESC cust_order
Name Null? Type -------------------------------- -------- -------------- ORDER_NBR NOT NULL NUMBER(7) CUST_NBR NOT NULL NUMBER(5) SALES_EMP_ID NOT NULL NUMBER(5) SALE_PRICE NUMBER(9,2) ORDER_DT NOT NULL DATE EXPECTED_SHIP_DT NOT NULL DATE CANCELLED_DT DATE SHIP_DT DATE STATUS VARCHAR2(20)SELECT order_nbr, cust_nbr, sales_emp_id, sale_price,
order_dt, expected_ship_dt
FROM cust_order;
ORDER_NBR CUST_NBR SALES_EMP_ID SALE_PRICE ORDER_DT EXPECTED_ --------- -------- ------------ ---------- --------- --------- 1001 1 7354 99 22-JUL-01 23-JUL-01 1000 1 7354 19-JUL-01 24-JUL-01 1002 5 7368 12-JUL-01 25-JUL-01 1003 4 7654 56 16-JUL-01 26-JUL-01 1004 4 7654 34 18-JUL-01 27-JUL-01 1005 8 7654 99 22-JUL-01 24-JUL-01 1006 1 7354 22-JUL-01 28-JUL-01 1007 5 7368 25 20-JUL-01 22-JUL-01 1008 5 7368 25 21-JUL-01 23-JUL-01 1009 1 7354 56 18-JUL-01 22-JUL-01 1012 1 7354 99 22-JUL-01 23-JUL-01 1011 1 7354 19-JUL-01 24-JUL-01 1015 5 7368 12-JUL-01 25-JUL-01 1017 4 7654 56 16-JUL-01 26-JUL-01 1019 4 7654 34 18-JUL-01 27-JUL-01 1021 8 7654 99 22-JUL-01 24-JUL-01 1023 1 7354 22-JUL-01 28-JUL-01 1025 5 7368 25 20-JUL-01 22-JUL-01 1027 5 7368 25 21-JUL-01 23-JUL-01 1029 1 7354 56 18-JUL-01 22-JUL-01 20 rows selected.
NULLs and Aggregate Functions
Notice that the
column sale_price
in the
cust_order
table is nullable, and that it
contains NULL values for some rows. To examine the effect of NULLs in
an aggregate function, execute the following SQL:
SELECT COUNT(*), COUNT(sale_price) FROM cust_order;
COUNT(*) COUNT(SALE_PRICE)
-------- -----------------
20 14
Notice the difference in the output of COUNT(*)
and COUNT(sale_price)
. This is because
COUNT(sale_price)
ignores NULLs, whereas
COUNT(*)
doesn’t. The reason
COUNT(*)
doesn’t ignore NULLs is
because it counts rows, not column values. The concept of NULL
doesn’t apply to a row as a whole. Other than
COUNT(*)
, there is only one other aggregate
function that doesn’t ignore NULLs, and that is
GROUPING. All other aggregate functions ignore NULLs. We will discuss
GROUPING in Chapter 13. For now,
let’s examine the effect of NULLs when they are
ignored.
SUM, MAX, MIN, AVG, etc., all ignore NULLs. Therefore, if you are
trying to find a value such as the average sale price in the
cust_order
table, the average will be of the 14
rows that have a value for that column. The following example shows
the count of all rows, the total of all sale prices, and the average
of all sale prices:
SELECT COUNT(*), SUM(sale_price), AVG(sale_price)
FROM cust_order;
COUNT(*) SUM(SALE_PRICE) AVG(SALE_PRICE) --------------- --------------- --------------- 20 788 56.2857143
Note that AVG(sale_price)
is not equal to
SUM(sale_price) / COUNT(*)
. If it were, the result
of AVG(sale_price)
would have been 788 / 20 =
39.4. But, since the AVG function ignores NULLS, it divides the total
sale price by 14, and not by 20. AVG(sale_price)
is equal to SUM(sale_price) / COUNT(sale_price) (788 / 14 =
56.2857143)
.
There may be situations where you want an average to be taken over all the rows in a table, not just the rows with non-NULL values for the column in question. In those situations you have to use the NVL function within the AVG function call to assign 0 (or some other useful value) to the column in place of any NULL values. (DECODE, CASE, or the COALESCE function can be used in place of NVL. See Chapter 9 for details.) Here’s an example:
SELECT AVG(NVL(sale_price,0)) FROM cust_order;
AVG(NVL(SALE_PRICE,0))
----------------------
39.4
Notice that the use of NVL causes all 20 rows to be considered for
average computation, and the rows with NULL values for
sale_price
are assumed to have a 0 value for
that column.
Use of DISTINCT and ALL
Most
aggregate functions allow
the
use of DISTINCT or
ALL along with the expression
argument. DISTINCT allows you to disregard duplicate expression
values, while ALL causes duplicate expression values to be included
in the result. Notice that the column cust_nbr
has
duplicate values. Observe the result of the following SQL:
SELECT COUNT(cust_nbr), COUNT(DISTINCT cust_nbr), COUNT(ALL cust_nbr)
FROM cust_order;
COUNT(CUST_NBR) COUNT(DISTINCTCUST_NBR) COUNT(ALLCUST_NBR) --------------- ----------------------- ------------------ 20 4 20
There are four distinct values in the cust_nbr
column. Therefore, COUNT(DISTINCT cust_nbr)
returns 4, whereas COUNT(cust_nbr)
and
COUNT(ALL cust_nbr)
both return 20. ALL is the
default, which means that if you don’t specify
either DISTINCT or ALL before the expression argument in an aggregate
function, the function will consider all the rows that have a
non-NULL value for the expression.
An important thing to note here is that ALL doesn’t
cause an aggregate function to consider NULL values. For example,
COUNT(ALL SALE_PRICE)
in the following example
still returns 14, and not 20:
SELECT COUNT(ALL sale_price) FROM cust_order;
COUNT(ALLSALE_PRICE)
--------------------
14
Since ALL is the default, you can explicitly use ALL with every aggregate function. However, the aggregate functions that take more than one argument as input don’t allow the use of DISTINCT. These include CORR, COVAR_POP, COVAR_SAMP, and all the linear regression functions.
In addition, some functions that take only one argument as input don’t allow the use of DISTINCT. This category includes STTDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, and GROUPING.
If you try to use DISTINCT with an aggregate function that doesn’t allow it, you will get an error. For example:
SELECT STDDEV_POP(DISTINCT sale_price)
FROM cust_order;
SELECT STDDEV_POP(DISTINCT sale_price) * ERROR at line 1: ORA-30482: DISTINCT option not allowed for this function
However, using ALL with such a function doesn’t cause any error. For example:
SELECT STDDEV_POP(ALL sale_price)
FROM cust_order;
STDDEV_POP(ALLSALE_PRICE) ------------------------- 29.5282639
The GROUP BY Clause
The GROUP BY clause, along with the aggregate functions, groups a result set into multiple groups, and then produces a single row of summary information for each group. For example, if you want to find the total number of orders for each customer, execute the following query:
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr;
CUST_NBR COUNT(ORDER_NBR) ---------- ---------------- 1 8 4 4 5 6 8 2
This query produces one summary line of output for each customer.
This is the essence of a GROUP BY query. You asked Oracle to GROUP
the results BY cust_nbr
; therefore, it produced
one output row for each distinct value of
cust_nbr
. Each data value for a given customer
represents a summary based on all rows for that customer.
Correspondence Between SELECT and GROUP BY
When you write a query with a GROUP BY clause, there are a number of rules you need to be aware of that govern the correspondence between the columns in the SELECT and GROUP BY clauses. Generally speaking, any nonaggregate expression in your SELECT clause must also be reflected in your GROUP BY clause.
Aggregate expressions generally require a GROUP BY clause
The nonaggregate
expression
cust_nbr
in the SELECT list of the query in our
most recent example also appears in the GROUP BY clause. If you have
a mix of aggregate and nonaggregate expressions in the SELECT list,
SQL expects that you are trying to perform a GROUP BY operation, and
you must also include a GROUP BY clause in your query. Oracle returns
an error if you fail to do so. For example, if you omit the GROUP BY
clause, the following error is returned:
SELECT cust_nbr, sales_emp_id, COUNT(order_nbr)
FROM cust_order;
SELECT cust_nbr, sales_emp_id, COUNT(order_nbr) * ERROR at line 1: ORA-00937: not a single-group group function
There is one case in which you can write aggregate expressions in a SELECT list without also writing a GROUP BY clause and that is when you wish those aggregate expressions to apply to the entire result set. In such a case, your SELECT list must consist only of aggregate expressions. The queries earlier in this chapter, introducing the aggregate functions, are good examples of this case.
GROUP BY clause must include all nonaggregate expressions
If you forget to include all nonaggregate expressions from the SELECT list in the GROUP BY clause, SQL returns the following error:
SELECT cust_nbr, sales_emp_id, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr;
SELECT cust_nbr, sales_emp_id, COUNT(order_nbr)
* ERROR at line 1: ORA-00979: not a GROUP BY expression
Aggregate functions not allowed in GROUP BY clause
You can’t use a group function (aggregate function) in the GROUP BY clause. You will get an error if you attempt to do so, as in the following example:
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr, COUNT(order_nbr);
GROUP BY cust_nbr, COUNT(order_nbr)
* ERROR at line 3: ORA-00934: group function is not allowed here
Constants can be omitted from the GROUP BY clause
If you have a constant in your SELECT list, you don’t need to include it in the GROUP BY clause. However, including the constant in the GROUP BY clause doesn’t alter the result. Therefore, both the following statements will produce the same output:
SELECT 'CUSTOMER', cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr;
SELECT 'CUSTOMER', cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY 'CUSTOMER', cust_nbr;
'CUSTOMER' CUST_NBR COUNT(ORDER_NBR) ---------- ---------- ---------------- CUSTOMER 1 8 CUSTOMER 4 4 CUSTOMER 5 6 CUSTOMER 8 2
Scalar functions may be grouped by their underlying column
If a scalar function has been applied to a column in the SELECT list, the syntax doesn’t force you to include the scalar function in the GROUP BY clause. For example:
SELECT SUBSTR(lname,1,1), COUNT(*)
FROM employee
GROUP BY lname;
S COUNT(*) - ---------- A 1 A 1 B 1 C 1 F 1 J 1 J 1 K 1 M 1 M 1 S 1 S 1 T 1 W 1 14 rows selected.
In this example, the SELECT list has
SUBSTR(lname,1,1)
; however, the GROUP BY clause
contains just lname
, without the SUBSTR function
on it. Though this query is syntactically correct, if you look at the
result set, you will notice that there are multiple rows with the
same value for SUBSTR(lname,1,1)
. This means that
the GROUP BY operation takes place for the entire
lname
, but only the substring is displayed. If you
really want the result set to be grouped by the substring expression,
you should include that expression in the GROUP BY clause, as shown
in the following example:
SELECT SUBSTR(lname,1,1), COUNT(*)
FROM employee
GROUP BY SUBSTR(lname,1,1);
S COUNT(*) - ---------- A 2 B 1 C 1 F 1 J 2 K 1 M 2 S 2 T 1 W 1 10 rows selected.
Notice the difference. This time, there is only one row in the result
set for each value returned by SUBSTR(lname,1,1)
.
The rows have been grouped on the exact same expression as is
displayed.
Concatenated columns may be grouped in either of two ways
If an expression in a SELECT list concatenates two columns, you can specify the GROUP BY clause in one of the following two ways—both giving the same result:
SELECT manager_emp_id || job_id, COUNT(*)
FROM employee
GROUP BY manager_emp_id || job_id;
SELECT manager_emp_id || job_id, COUNT(*)
FROM employee
GROUP BY manager_emp_id, job_id;
MANAGER_EMP_ID||JOB_ID COUNT(*) ------------------------- ---------- 672 1 7566669 2 7698 1 7698667 1 7698670 3 7782667 1 7788 1 7839671 3 7902667 1
You can sometimes exclude a nonaggregate expression from the GROUP BY clause
There are certain situations in which you want an expression in the SELECT list, but don’t want to group by the same. For example, you might want to display a line number along with summary information for each customer. Attempt to do so using the following query, and you will get an error:
SELECT ROWNUM, cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr;
SELECT ROWNUM, cust_nbr, COUNT(order_nbr) * ERROR at line 1: ORA-00979: not a GROUP BY expression
If you include ROWNUM in the GROUP BY clause, you’ll get the following, unexpected result:
SELECT ROWNUM, cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY ROWNUM, cust_nbr;
ROWNUM CUST_NBR COUNT(ORDER_NBR) ---------- ---------- ---------------- 1 1 1 2 1 1 3 5 1 4 4 1 5 4 1 6 8 1 7 1 1 8 5 1 9 5 1 10 1 1 11 1 1 12 1 1 13 5 1 14 4 1 15 4 1 16 8 1 17 1 1 18 5 1 19 5 1 20 1 1 20 rows selected.
You certainly didn’t want this result, did you? You
wanted to receive one summary row for each customer, and then to
display ROWNUM for those lines. But when you include ROWNUM in the
GROUP BY clause, it produces one summary row for each row selected
from the table cust_order
. To get the expected
result, you should use the following SQL:
SELECT ROWNUM, v.*
FROM (SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order GROUP BY cust_nbr) v;
ROWNUM CUST_NBR COUNT(ORDER_NBR) ---------- ---------- ---------------- 1 1 8 2 4 4 3 5 6 4 8 2
The construct in the FROM clause is called an inline view. Read more about inline views in Chapter 5.
You are not required to show your GROUP BY columns
Syntactically, it is not mandatory to include all the expressions of the GROUP BY clause in the SELECT list. However, those expressions not in the SELECT list will not be represented in the output; therefore, the output may not make much sense. For example:
SELECT COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr;
COUNT(ORDER_NBR) ---------------- 8 4 6 2
This query produces a count of orders for each customer (by grouping
based on cust_nbr
), but without the
cust_nbr
in the output you can’t
associate the counts with the customers. Extending the previous
example, you can see that without a consistent SELECT list and GROUP
BY clause, the output may be a bit confusing. The following example
produces output that at first glance seems useful:
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr, order_dt;
CUST_NBR COUNT(ORDER_NBR) ---------- ---------------- 1 2 1 2 1 4 4 2 4 2 5 2 5 2 5 2 8 2 9 rows selected.
From the output, it appears that you are trying to obtain a count of
orders for each customer. However, there are multiple rows in the
output for some cust_nbr
values. The fact that you
have included order_dt
in the GROUP BY clause, and
therefore generated a summary result for each combination of
cust_nbr
and order_dt
, is
missing from the output. You can’t make sense of the
output unless the output and the SQL statement are looked at
together. You can’t expect all readers of SQL output
to understand SQL syntax, can you? Therefore, we always recommend
maintaining consistency between the nonaggregate expressions in the
SELECT list and the expressions in the GROUP BY clause. A more
meaningful version of the previous SQL statement would be as follows:
SELECT cust_nbr, order_dt, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr, order_dt;
CUST_NBR ORDER_DT COUNT(ORDER_NBR) ---------- --------- ---------------- 1 18-JUL-01 2 1 19-JUL-01 2 1 22-JUL-01 4 4 16-JUL-01 2 4 18-JUL-01 2 5 12-JUL-01 2 5 20-JUL-01 2 5 21-JUL-01 2 8 22-JUL-01 2 9 rows selected.
This output is consistent with the GROUP BY clause in the query. Readers of the report are more likely to make the correct assumption about what this output represents.
GROUP BY Clause and NULL Values
When you GROUP BY a column that contains NULL values for some rows, all the rows with NULL values are placed into a single group and presented as one summary row in the output. For example:
SELECT sale_price, COUNT(order_nbr)
FROM cust_order
GROUP BY sale_price;
SALE_PRICE COUNT(ORDER_NBR) ---------- ---------------- 25 4 34 2 56 4 99 4 6
Notice that the last row in the output consists of a NULL value for
the column sale_price
. If you want the row
containing the NULL value to be the first row in the output, you can
perform an ORDER BY on sale_price
in descending
order:
SELECT sale_price, COUNT(order_nbr)
FROM cust_order
GROUP BY sale_price
ORDER BY sale_price DESC;
SALE_PRICE COUNT(ORDER_NBR) ---------- ---------------- 6 99 4 56 4 34 2 25 4
Whether you are using a GROUP BY or not, the ORDER BY clause can have an optional NULLS FIRST or NULLS LAST option to put the NULLs either at the beginning or at the end of the result set, respectively. For example, to sort NULLs first:
SELECT sale_price, COUNT(order_nbr)
FROM cust_order
GROUP BY sale_price
ORDER BY sale_price NULLS FIRST;
SALE_PRICE COUNT(ORDER_NBR) ---------- ---------------- 6 25 4 34 2 56 4 99 4
Or, to sort NULLs last:
SELECT sale_price, COUNT(order_nbr)
FROM cust_order
GROUP BY sale_price
ORDER BY sale_price NULLS LAST;
SALE_PRICE COUNT(ORDER_NBR) ---------- ---------------- 25 4 34 2 56 4 99 4 6
GROUP BY Clause with WHERE Clause
While producing summary results using the GROUP BY clause, you can filter records from the table based on a WHERE clause, as in the following example, which produces a count of orders in which the sale price exceeds $25 for each customer:
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
WHERE sale_price > 25
GROUP BY cust_nbr;
CUST_NBR COUNT(ORDER_NBR) ---------- ---------------- 1 4 4 4 8 2
While executing a SQL statement with a WHERE clause and a GROUP BY clause, Oracle first applies the WHERE clause and filters out the rows that don’t satisfy the WHERE condition. The rows that satisfy the WHERE clause are then grouped using the GROUP BY clause.
SQL syntax requires that the WHERE clause must come before the GROUP BY clause. Otherwise, the following error is returned:
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr
WHERE sale_price > 25;
WHERE sale_price > 25 * ERROR at line 4: ORA-00933: SQL command not properly ended
The HAVING Clause
The HAVING clause is closely associated with the GROUP BY clause. The HAVING clause is used to put a filter on the groups created by the GROUP BY clause. If a query has a HAVING clause along with a GROUP BY clause, the result set will include only the groups that satisfy the condition specified in the HAVING clause. Let’s look at some examples that illustrate this. The following query returns the number of orders per customer:
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr
HAVING cust_nbr < 6;
CUST_NBR COUNT(ORDER_NBR) ---------- ---------------- 1 8 4 4 5 6
Notice that the output only includes customers with numbers below 6.
That’s because the HAVING clause specified
cust_nbr < 6
as a condition. Orders for all
customers were counted, but only those groups that matched the
specified HAVING condition were returned as the result.
The previous example is a poor use of the HAVING clause, because that
clause references only unsummarized data. It’s more
efficient to use WHERE cust_nbr < 6
instead of
HAVING cust_nbr < 6
, because the WHERE clause
eliminates rows prior to summarization, whereas HAVING eliminates
groups post-summarization. A better version of the previous query
would be:
SELECT cust_nbr, COUNT(order_nbr) FROM cust_order WHERE cust_nbr < 6 GROUP BY cust_nbr;
The next example shows a more appropriate use of the HAVING clause:
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr
HAVING COUNT(order_nbr) > 2;
CUST_NBR COUNT(ORDER_NBR) ---------- ---------------- 1 8 4 4 5 6
See the use of the aggregate function COUNT in the HAVING clause? This is an appropriate use for HAVING, because the results of the aggregate function cannot be determined until after the grouping takes place.
The syntax for the HAVING clause is similar to that of the WHERE clause. However, there is one restriction on the conditions you can write in the HAVING clause. A HAVING condition can refer only to an expression in the SELECT list, or to an expression involving an aggregate function. If you specify an expression in the HAVING clause that isn’t in the SELECT list, or that isn’t an aggregate expression, you will get an error. For example:
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr
HAVING order_dt < SYSDATE;
HAVING order_dt < SYSDATE * ERROR at line 4: ORA-00979: not a GROUP BY expression
However, you can use an aggregate expression in the HAVING clause, even if it doesn’t appear in the SELECT list, as illustrated in the following example:
SELECT cust_nbr
FROM cust_order
GROUP BY cust_nbr
HAVING COUNT(order_nbr) < 5;
CUST_NBR ---------- 4 8
In the preceding example, the HAVING clause refers to
COUNT(order_nbr)
, which is not in the SELECT list.
You are not required to show in your result set all the columns or
expressions that determine which rows end up in that result set.
The order of the GROUP BY clause and the HAVING clause in a SELECT statement is not important. You can specify the GROUP BY clause before the HAVING clause, or vice versa. Therefore, the following two queries are the same and produce the same result:
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
GROUP BY cust_nbr
HAVING COUNT(order_nbr) > 2;
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
HAVING COUNT(order_nbr) > 2
GROUP BY cust_nbr;
CUST_NBR COUNT(ORDER_NBR) ---------- ---------------- 1 8 4 4 5 6
Even though Oracle doesn’t care whether the HAVING clause comes before the GROUP BY clause or after, the HAVING clause is applied to the groups created by the GROUP BY clause, so it is a good programming practice to always put the HAVING clause after the GROUP BY clause. Another reason for placing HAVING after GROUP BY is that SQL Standard requires that particular order. Thus, putting HAVING after GROUP BY makes your code more portable.
You can use a WHERE clause and a HAVING clause together in a query. When you do, it is important to understand the impact of the two clauses. The WHERE clause is executed first, and the rows that don’t satisfy the WHERE condition are not passed to the GROUP BY clause. The GROUP BY clause summarizes the filtered data into groups, and then the HAVING clause is applied to the groups to eliminate the groups that don’t satisfy the HAVING condition. The following example illustrates this:
SELECT cust_nbr, COUNT(order_nbr)
FROM cust_order
WHERE sale_price > 25
GROUP BY cust_nbr
HAVING COUNT(order_nbr) > 1;
CUST_NBR COUNT(ORDER_NBR) ---------- ---------------- 1 4 4 4 8 2
In this example, the WHERE clause first eliminates all the orders
that don’t satisfy the condition sale_price
> 25
. The rest of the rows are grouped on
cust_nbr
. The HAVING clause eliminates the
customers that don’t have more than
one order.
Nested Group Operations
The examples discussed in this chapter so far all involved one group operation on a column of a table. SQL also allows you to nest group functions, which means that one group function can enclose an expression that is itself a group operation on another expression or column. Let’s consider the following example:
An economic slowdown has resulted in budget constraints for many employers, especially in the IT industry. Budget constraints have forced companies to take a second look at the money spent on employee compensation. Some companies have had to downsize their workforce, others have had to cut down employee bonuses, while still others have cut the employee base salaries. Your company is no exception, and is also under financial pressure. Your CEO must take a serious look at the compensation structure of the employees at various levels in various departments in the company.
Your CEO calls on you to query the Human Resources database and help him collect data. By this time, you’ve already heard the rumors of upcoming “reductions in force” (RIFs) and compensation cuts. This is your golden opportunity to impress your CEO with your skills, to make sure you are not affected by the RIFs. Mess up now, and you can be pretty sure that you have to start looking for another job in this increasingly competitive job market.
Here’s your CEO’s first question: What is the maximum amount of money spent by any department on employee salaries?
To answer this question, you know that you need to compute the sum of the salaries of all the employees in each department, and then find the maximum of those individual sums. Now that you know about the GROUP BY clause, finding the sum of salaries for all the employees in each department is easy:
SELECT dept_id, SUM(salary)
FROM employee
GROUP BY dept_id;
DEPT_ID SUM(SALARY) ---------- ----------- 10 8750 20 9900 30 9400
However, your task is half done. You next need to find the maximum of
the SUM(salary)
values returned by this query. One
way to do that is to use the preceding query as an inline view.
(Inline views are discussed in detail in Chapter 5.) The following SELECT takes the results
from the earlier query, which is now a nested query, and applies the
MAX function to retrieve the highest SUM(salary)
value:
SELECT MAX(sal) FROM
(SELECT dept_id, SUM(salary) sal
FROM employee
GROUP BY dept_id);
MAX(SAL) ---------- 9900
However, you don’t even need to write a subquery. Another, simpler way of writing the query you need is:
SELECT MAX(SUM(salary))
FROM employee
GROUP BY dept_id;
MAX(SUM(SALARY)) ---------------- 9900
The MAX(SUM(salary))
in this query is a nested
group operation. When the query executes, the rows are aggregated by
department. The innermost group function, in this case the SUM
function, is used to generate one salary
value per
department. This is no different than in previous examples, but this
time you have a MAX function seemingly left over. That MAX function
is applied to the entire collection of SUM(salary)
values produced by the initial aggregation. The result is a single
value, the maximum amount of money that any one department spends on
employee salaries, which is just what your CEO wanted.
Tip
It never makes sense to use more than one level of nested group
function. MAX(SUM(salary))
returns a single value.
One group function enclosing another will always return a single
value. No further aggregation is possible.
Knowing only the maximum of the total salaries paid by a department isn’t going to help much. So, your CEO’s next question is to ask about the minimum and average amounts of money spent by any department on employee salaries. That should be an easy one to answer now. Just apply the same pattern as used in the previous query:
SELECT MIN(SUM(salary)), AVG(SUM(salary))
FROM employee
GROUP BY dept_id;
MIN(SUM(SALARY)) AVG(SUM(SALARY)) ---------------- ---------------- 8750 9350
Observing that the maximum (9900) is not too large compared to the minimum (8750) and the average (9350), your CEO realizes that all the departments spend pretty much uniformly on employee salaries. He next asks: What is the maximum, minimum, and average number of employees in any department? Use the following query to answer that question:
SELECT MAX(COUNT(*)), MIN(COUNT(*)), AVG(COUNT(*))
FROM employee
GROUP BY dept_id;
MAX(COUNT(*)) MIN(COUNT(*)) AVG(COUNT(*)) ------------- ------------- ------------- 6 3 4.66666667
The information that some departments have double the number of employees than some others may give your CEO some ideas about how he wants to reorganize the company and reduce cost. Hopefully, you have impressed your CEO with your SQL skills to the point that you are sleeping better at nights now, secure that you, at least, will still be employed in the morning.
Get Mastering Oracle SQL, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.