# Subqueries

A subquery is a SELECT statement embedded within another SQL statement. You can use subqueries in place of column references, in place of table references, to generate values for comparison in a WHERE or HAVING condition, and to generate values for inserts and updates.

## Treating Rowsets as Tables

So far in this chapter, you've selected data from tables. A SELECT statement is executed, and a set of rows comes back as the result. Imagine if you could further treat that set of rows as a table against which you issue another SELECT statement or perhaps an UPDATE or a DELETE. Using a subquery, you can do just what I've described, and that can lead to some interesting and elegant solutions to SQL problems.

One use for a subquery in the FROM clause is to aggregate data that has already been aggregated. For example, you might be faced with the following business problem:

Find all employees who have worked on projects 1001 and 1002 during the year 2004. Sum the number of hours each of those employees has worked on each project during that year. Report the ranges. Show the lowest number of hours that any employee worked on each project during 2004, as well as the highest number of hours.

This is an interesting problem to solve because you must sum the hours for employee/project combinations using the SUM aggregate function, and you must apply MIN and MAX to your sums. Example 4-32 shows one way to approach this problem using a subquery.

ExampleÂ 4-32.Â Aggregating aggregated data

```                  `SELECT phs.project_id, MIN(phs.hours), MAX(phs.hours)`
`FROM (SELECT employee_id, project_id, SUM(hours_logged) hours`
`      FROM project_hours`
`      WHERE project_id IN (1001, 1002)`
```        AND TRUNC(time_log_date,'Year') = TO_DATE('1-Jan-2004',
'dd-Mon-yyyy')```
`      GROUP BY employee_id, project_id) phs`
`GROUP BY project_id;`

PROJECT_ID MIN(PHS.HOURS) MAX(PHS.HOURS)
---------- -------------- --------------
1001              4             20
1002              8             24```

The subquery appears in the FROM clause enclosed in parentheses. The outer query treats the rows from the subquery in the same manner as it would treat rows from a table. Tables have names, and so should subqueries. Example 4-32 gives the name, or alias, `phs` to the subquery. The outer query uses that name to refer to the columns from the subquery. The alias `hours` is given to the column represented by `SUM(hours_logged)`, making it easy to refer to that column from the outer query.

### Tip

The call to `TRUNC(time_log_date,'Year')` converts each date to January 1st of its year. Any date in 2004 will be truncated to 1-Jan-2004. It's easy, then, to eliminate time log entries for other years.

Because subqueries in the FROM clause are treated the same as tables, it stands to reason they can take part in joins. Example 4-33 expands on Example 4-32 by adding another level of subquery and a join to the `project` table in order to include project names in the query output.

ExampleÂ 4-33.Â Joining a subquery to a table

```SELECT minmax_hours.project_id, p.project_name,
minmax_hours.min_hours, minmax_hours.max_hours
FROM (
SELECT phs.project_id, MIN(phs.hours) min_hours, MAX(phs.hours)
max_hours
FROM (SELECT employee_id, project_id, SUM(hours_logged) hours
FROM project_hours
WHERE project_id IN (1001, 1002)
AND TRUNC(time_log_date,'Year') = TO_DATE('1-Jan-2004',
'dd-Mon-yyyy')
GROUP BY employee_id, project_id) phs
GROUP BY project_id) minmax_hours JOIN project p
ON minmax_hours.project_id = p.project_id;

PROJECT_ID PROJECT_NAME                              MIN_HOURS  MAX_HOURS
---------- ---------------------------------------- ---------- ----------
1001 Corporate Web Site                                4         20
1002 Enterprise Resource Planning System               8         24```

There are different ways you can approach the join to `project` shown in Example 4-33. The join could occur in the innermost subquery or in the middle subquery, and either of those alternatives would eliminate the need for a third subquery. Sometimes it's a judgment call as to which approach is best. Sometimes you need to make that call based on readability or on a desire not to mess with a working query. By adding an outer SELECT statement to an already working query, I avoided the need to tamper with a SELECT statement, the middle one in Example 4-33, which I knew worked.

There's a performance issue, too, that isn't obvious in the amount of sample data used for the examples in this book. Joining to `project` in the innermost query would force the join to take place before any aggregation at all. Given a large enough number of rows in `project_hours`, a join that early would significantly increase the expenditure of I/O and CPU resources by the query, as all those detail rows would need to be joined. Having the join occur where it does in Example 4-33 means that only two rows, the two returned by the middle query, need to be joined to `project`.

Subqueries in the FROM clause are sometimes referred to as inline views, and such a subquery can be considered as a dynamically created view, for the purpose of the one query. Any subquery in the FROM clause can be replaced by an equivalent view, but then you have the problem of creating that view, which is a permanent database object.

### Tip

You can see another interesting use of inline views, this time to generate rows of data that don't already exist in the database, in the article Creating Pivot Tables at http://gennick.com/pivot.html.

## Testing for Representation

You can use subqueries to see whether a row is representative of a set. The query in Example 4-32 contains the following WHERE condition:

`WHERE project_id IN (1001, 1002)`

This condition tests whether a row from the `project_hours` table is associated with project 1001 or 1002. Imagine a more complicated scenario. Imagine that you're interested in all projects having budgets of \$1,000,000 or more. You don't know which projects have such large budgets. Furthermore, budgets change from time to time, and you don't want to have to modify the project ID numbers in your query each time your budgets change. Instead of hard-coding the `project_id` list for the IN predicate, you can generate that list using a subquery in the IN predicate, as shown in Example 4-34.

ExampleÂ 4-34.Â A subquery generating values for an IN predicate

```SELECT phs.project_id, MIN(phs.hours), MAX(phs.hours)
FROM (SELECT employee_id, project_id, SUM(hours_logged) hours
FROM project_hours
WHERE project_id IN (SELECT project_id
FROM project
WHERE project_budget >= 1000000)
AND TRUNC(time_log_date,'Year') = TO_DATE('1-Jan-2004','dd-Mon-yyyy')
GROUP BY employee_id, project_id) phs
GROUP BY project_id;```

When you execute the SELECT statement shown in Example 4-34, the IN predicate's subquery is executed first to generate a list of project IDs encompassing all projects having budgets of \$1,000,000 or more. Another approach to this problem is to issue a subquery for each `project_hour` row to see whether the associated project has the required budget. This approach is shown in Example 4-35.

ExampleÂ 4-35.Â An EXISTS subquery checking for projects with large budgets

```SELECT phs.project_id, MIN(phs.hours), MAX(phs.hours)
FROM (SELECT employee_id, project_id, SUM(hours_logged) hours
FROM project_hours ph
WHERE EXISTS (SELECT *
FROM project pb
WHERE pb.project_budget >= 1000000
AND pb.project_id = ph.project_id)
AND TRUNC(time_log_date,'Year') = TO_DATE('1-Jan-2004','dd-Mon-yyyy')
GROUP BY employee_id, project_id) phs
GROUP BY project_id;```

The subquery in Example 4-35 is known as a correlated subquery, meaning that the subquery references a value from the parent statement. Aliases are important when writing such subqueries. The parent query's `project_hours` table is aliased as `ph`. The subquery references the project ID from each candidate row from the main query via the reference to `ph.project_id`. The subquery is executed once for each row that can potentially be returned by the main query.

### Tip

Subqueries prior to Example 4-35 have all been non-correlated, which means that you can execute them independently of their enclosing statement. Non-correlated subqueries are executed just once per execution of their containing query. Correlated subqueries are executed once per row. Either subquery may be written using IN or EXISTS, but typically you'll find that IN subqueries are non-correlated and EXISTS subqueries are correlated.

It's pretty much impossible to provide a general rule as to when to use IN and EXISTS predicates when either can be used to solve a particular problem. Given the low number of rows in the `project` table as compared to the `project_hours` table, you might think that the SELECT in Example 4-34 would be more efficient than the one in Example 4-35. After all, the IN predicate's subquery must execute only once, and return only two values. However, in my database, with the release of Oracle that I'm running and the sample data I'm using for this book, the SELECT in Example 4-35 requires far less input/output than Example 4-34. A good practice is to test both approaches, perhaps using SET AUTOTRACE ON as described in Chapter 12, and then use the one that performs best for your particular query.

## Generating Data for INSERTs and UPDATEs

Subqueries are useful in generating values for INSERT and UPDATE statements. Examples Example 4-36 and Example 4-37 show two ways of creating a reporting table of project billing data summarized by week. You might generate, and periodically refresh, such a table to make it easier for end users to check on project status using ad hoc query tools.

Example 4-36 uses a CREATE TABLE AS SELECT FROM statement, which lets you create and populate the reporting table in one step. Example 4-37 creates the reporting table separately, and then populates the table using an INSERT...SELECT FROM statement. In each case, a subquery generates the data to be inserted into the new table.

ExampleÂ 4-36.Â Using CREATE TABLE AS SELECT FROM

```DROP TABLE project_time;

CREATE TABLE project_time AS
SELECT EXTRACT (YEAR FROM time_log_date) year,
TO_NUMBER(TO_CHAR(time_log_date,'ww')) week_number,
p.project_id, p.project_name, SUM(ph.hours_logged) hours_logged
FROM project p JOIN project_hours ph
ON p.project_id = ph.project_id
GROUP BY EXTRACT (YEAR FROM time_log_date),
TO_NUMBER(TO_CHAR(time_log_date,'ww')),
p.project_id, p.project_name;```

ExampleÂ 4-37.Â Using INSERT...SELECT FROM

```DROP TABLE project_time;

CREATE TABLE project_time (
year NUMBER,
week_number NUMBER,
project_id NUMBER(4),
project_name VARCHAR2(40),
hours_logged NUMBER);

INSERT INTO project_time
SELECT EXTRACT (YEAR FROM time_log_date),
TO_NUMBER(TO_CHAR(time_log_date,'ww')),
p.project_id, p.project_name, SUM(ph.hours_logged)
FROM project p JOIN project_hours ph
ON p.project_id = ph.project_id
GROUP BY EXTRACT (YEAR FROM time_log_date),
TO_NUMBER(TO_CHAR(time_log_date,'ww')),
p.project_id, p.project_name;```

Having created the `project_time` table with its redundant `project_name` column, it's a given that someone, someday, will come along and change a project name on you, leaving you to sort out the resulting mess by somehow propagating the new project name to all the summary rows in `project_time`. That kind of update is easily done using a subquery in the SET clause of an UPDATE statement, as in Example 4-38.

ExampleÂ 4-38.Â A subquery generating a value for an UPDATE

```UPDATE project_time pt
SET pt.project_name = (SELECT p.project_name
FROM project p
WHERE p.project_id = pt.project_id)
WHERE pt.project_name <> (SELECT p.project_name
FROM project p
WHERE p.project_id = pt.project_id)```

The UPDATE in Example 4-38 reads each row in the `project_time` table and updates those `project_time` rows subject to name changes.

Get Oracle SQL*Plus: The Definitive Guide, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.