It's often the case that the data you want to return from a query
are spread across more than one table. Commonly this occurs as a result
of master-detail relationships such as the one between the project
and project_hours
tables. The project
table contains one row describing each
project, whereas, for each project, project_hours
contains many rows detailing
time charged to that project. To link such master and detail data
together in the results from a SELECT statement, you use a relational
operation called a join.
Imagine that you need to generate a detailed listing of time
charged to each of your projects. You want that listing to include
project name, the date on which work was performed, and the number of
hours involved. What makes this request interesting is that you need
to list two tables in your FROM clause. Project names come from the
project
table, while dates and
hours come from the project_hours
table. How do you SELECT from two tables at once? The answer is to use
a JOIN clause such as the one in Example 4-25.
Example 4-25. Selecting related rows from two tables
SELECT p.project_name, ph. employee_id, ph.time_log_date, ph.hours_logged FROM project p JOIN project_hours ph ON p.project_id = ph.project_id;
The JOIN clause, which is a subclause of the FROM clause, is new
in Oracle9i Database and specifies that data are
to be drawn from not one table, but from two, in this case from the
project
and project_hours
tables. The ON clause,
subsidiary to JOIN, defines the conditions that rows of combined data
must meet to be returned from the join operation. This will make more
sense as you scrutinize Figures Figure 4-2 through Figure 4-4.
Conceptually, all joins begin as a set of all possible
combinations of rows from the two tables involved. Such a set is known
as a Cartesian product. Figure 4-2 shows the beginning
of a join between a three-row project
table and a four-row project_hours
table.
Tip
Databases implement all kinds of optimizations to avoid generating Cartesian products. However, understanding the conceptual process defining the join operation is critical to writing accurate SQL queries.
Each row from project
has
been combined with each row from project_hours
. I've highlighted the JOIN
clause in the query to show how much of the query has been executed so
far.
Looking at Figure 4-2, you can see that not all combinations make sense. Detail from project 1004 has no business being associated with project 1002, nor does the reverse make sense. Project 1003 should have no detail at all. This is where the join conditions in the ON clause come into play. When you write a join, you should write join conditions to identify those rows in the Cartesian product that make sense, that you wish returned from the join operation. Rows not matching the join conditions are filtered out. Think of a sieve filtering out large gravel and passing only the sand. Figure 4-3 shows the results of evaluating the join condition in the example query.
Figure 4-3. Join conditions pass only those row combinations that make sense, eliminating all the others
As Figure 4-3
illustrates, the next step after forming the Cartesian product is to
apply the conditions in the ON clause, using those to eliminate row
combinations that make no business sense. Look at the project ID
numbers in Figure 4-3s
results. Each row output from the join operation corresponds to a row
from project_hours
. Each row
contains corresponding project information (name and ID) from project
. At this intermediate stage, you can
think of all columns from both tables as being present in each row.
Thus, each row has two copies of the project ID number, one from each
table, and those two values are used (see the ON clause) to identify
those row combinations that make business sense.
The join illustrated in Figure 4-3 is known as an inner join. In fact, you can write it using the keywords INNER JOIN instead of just JOIN. The key characteristic of an inner join is that each row of output from a join operation contains values from both tables. The choice of the term inner join is unfortunate because that term is not at all evocative of the results produced.
It's entirely possible to place WHERE conditions in a query containing a join and this is often done. Such WHERE conditions are evaluated after all the joins. Figure 4-4 shows the same query as Figure 4-3 but with the addition of a WHERE clause. You can see the effect of WHERE and when that clause is evaluated. You can see that the generation of columns in the SELECT list is the final operation in the query's execution.
The process I've described in this section is conceptual. It's a
good way to think about joins that will help you write better queries
and to write them more easily. However, this conceptual process is
seldom efficient when applied in real life. Instead, databases
implement many techniques to optimize joins and query executions. For
example, with respect to the WHERE clause in Figure 4-4, a database's query
optimizer may "decide" to evaluate that condition early, eliminating
rows from project_hours
for
February before the join, not after it.
Don't get caught up in thinking about query optimizations when writing join queries. Think conceptually until you've written a query that yields the correct results. No matter how a database optimizes a query, especially a join query, in the end the results must match the conceptual model I've just described. All joins begin with a Cartesian product of all possible combinations of rows from two tables. Join conditions then eliminate unwanted rows. It's that simple.
Go back to Figure
4-3 and look at the rows feeding into and out of the ON clause
evaluation. What happened to project 1003? Rows referencing that
project existed in the project
table, were part of the Cartesian product, but were completely
eliminated by the ON clause. Why? Because the project
rows for 1003 had no counterparts in
the project_hours
table. As a
result, no rows were in the Cartesian product for project 1003 in
which both project_id
values matched, and thus, project
1003 completely vanished from the query's result set. This behavior is
not always what you want.
To join project
to project_hours
using the data shown in Figure 4-3, yet preserving
projects having no corresponding detail in project_hours
, you need to use an
outer join. An outer join designates one table as
optional and the other as an anchor. Rows from the anchor table in an
outer join are preserved even when no corresponding detail rows
exist.
Figure 4-5 shows a
left outer join, designating the table on the
left-hand side of the JOIN keyword as the anchor table. The project
table is to the left, so project
rows are preserved.
Look carefully at what goes on in Figure 4-5. The condition in
the ON clause results in the elimination of all rows in the Cartesian
product associated with the row from the project
table for project 1003. This is when
the outer join logic kicks in. The four combination rows for project
1003 are, in fact, removed from the join, but a new row is slipped in
for project 1003 containing values only from the
project
table. Column values that
would otherwise come from project_hours
are set to null.
Tip
Remember that my explanation is conceptual, a useful way of visualizing the outer join process. How the database engine implements this conceptual process may be quite different from what I've described.
Another type of join is the right outer join. Right and left outer joins are the same fundamental operation. The only difference lies in which side of the JOIN keyword you place the anchor table. Example 4-26 shows semantically identical left and right outer joins.
Example 4-26. Left and right outer joins are fundamentally the same operation
SELECT p.project_name, ph.time_log_date, ph.hours_logged FROM project p LEFT OUTER JOIN project_hours ph ON p.project_id = ph.project_id; SELECT p.project_name, ph.time_log_date, ph.hours_logged FROM project_hours ph RIGHT OUTER JOIN project p ON p.project_id = ph.project_id;
Example 4-26s
first query is a left outer join: project
is to the left, so project
is the anchor table. The second
query is a right outer join, but I've moved project
to the right-hand side of the JOIN
keyword, so it is still the anchor table. Both queries generate the
same result.
The final type of outer join that I want to discuss is the full outer join. Think of a full outer join as the combination of all three types of joins so far: inner, left, and right:
A full outer join will return at least one of each row from both tables. You would typically then have WHERE clause conditions to further filter the results.
Tip
Oracle Database 10g introduced yet another type of join, the partition outer join. Read about that at http://gennick.com/partition.html.
Get Oracle SQL*Plus: The Definitive Guide, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.