BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


Mastering Oracle SQL
Mastering Oracle SQL

By Sanjay Mishra, Alan Beaulieu

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction to SQL
In this introductory chapter, we explore the origin and utility of the SQL language, demonstrate some of the more useful features of the language, and define a simple database design from which most examples in the book are derived.
SQL, which stands for Structured Query Language, is a special-purpose language used to define, access, and manipulate data. SQL is nonprocedural, meaning that it describes the necessary components (i.e., tables) and desired results without dictating exactly how results should be computed. Every SQL implementation sits atop a database engine, whose job it is to interpret SQL statements and determine how the various data structures in the database should be accessed in order to accurately and efficiently produce the desired outcome.
The SQL language includes two distinct sets of commands: Data Definition Language (DDL) is the subset of SQL used to define and modify various data structures, while Data Manipulation Language (DML) is the subset of SQL used to access and manipulate data contained within the data structures previously defined via DDL. DDL includes numerous commands for handling such tasks as creating tables, indexes, views, and constraints, while DML is comprised of just four statements:
INSERT
Adds data to a database.
UPDATE
Modifies data in a database.
DELETE
Removes data from a database.
SELECT
Retrieves data from a database.
Some people feel that DDL is the sole property of database administrators, while database developers are responsible for writing DML statements, but the two are not so easily separated. It is difficult to efficiently access and manipulate data without an understanding of what data structures are available and how they are related; likewise, it is difficult to design appropriate data structures without knowledge of how the data will be accessed. That being said, this book deals almost exclusively with DML, except where DDL is presented in order to set the stage for one or more DML examples. The reasons for focusing on just the DML portion of SQL include:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Is SQL?
SQL, which stands for Structured Query Language, is a special-purpose language used to define, access, and manipulate data. SQL is nonprocedural, meaning that it describes the necessary components (i.e., tables) and desired results without dictating exactly how results should be computed. Every SQL implementation sits atop a database engine, whose job it is to interpret SQL statements and determine how the various data structures in the database should be accessed in order to accurately and efficiently produce the desired outcome.
The SQL language includes two distinct sets of commands: Data Definition Language (DDL) is the subset of SQL used to define and modify various data structures, while Data Manipulation Language (DML) is the subset of SQL used to access and manipulate data contained within the data structures previously defined via DDL. DDL includes numerous commands for handling such tasks as creating tables, indexes, views, and constraints, while DML is comprised of just four statements:
INSERT
Adds data to a database.
UPDATE
Modifies data in a database.
DELETE
Removes data from a database.
SELECT
Retrieves data from a database.
Some people feel that DDL is the sole property of database administrators, while database developers are responsible for writing DML statements, but the two are not so easily separated. It is difficult to efficiently access and manipulate data without an understanding of what data structures are available and how they are related; likewise, it is difficult to design appropriate data structures without knowledge of how the data will be accessed. That being said, this book deals almost exclusively with DML, except where DDL is presented in order to set the stage for one or more DML examples. The reasons for focusing on just the DML portion of SQL include:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Brief History of SQL
In the early 1970s, an IBM research fellow named Dr. E. F. Codd endeavored to apply the rigors of mathematics to the then-untamed world of data storage and retrieval. Codd's work led to the definition of the relational data model and a language called DSL/Alpha for manipulating data in a relational database. IBM liked what they saw, so they commissioned a project called System/R to build a prototype based on Codd's work. Among other things, the System/R team developed a simplified version of DSL called SQUARE, which was later renamed SEQUEL, and finally renamed SQL.
The work done on System/R eventually led to the release of various IBM products based on the relational model. Other companies, such as Oracle, rallied around the relational flag as well. By the mid 1980's, SQL had gathered sufficient momentum in the marketplace to warrant oversight by the American National Standards Institute (ANSI). ANSI released its first SQL standard in 1986, followed by updates in 1989, 1992, and 1999.
Thirty years after the System/R team began prototyping a relational database, SQL is still going strong. While there have been numerous attempts to dethrone relational databases in the marketplace, well-designed relational databases coupled with well-written SQL statements continue to succeed in handling large, complex data sets where other methods fail.
Given that Oracle was an early adopter of the relational model and SQL, one might think that they would have put a great deal of effort into conforming with the various ANSI standards. For many years, however, the folks at Oracle seemed content that their implementation of SQL was functionally equivalent to the ANSI standards without being overly concerned with true compliance. Beginning with the release of Oracle8i, however, Oracle has stepped up its efforts to conform to ANSI standards and has tackled such features as the CASE statement and the left/right/full outer join syntax.
Ironically, the business community seems to be moving in the opposite direction. A few years ago, people were much more concerned with portability and would limit their developers to ANSI-compliant SQL so that they could implement their systems on various database engines. Today, companies tend to pick a database engine to use across the enterprise and allow their developers to use the full range of available options without concern for ANSI-compliance. One reason for this change in attitude is the advent of n-tier architectures, where all database access can be contained within a single tier instead of being scattered throughout an application. Another possible reason might be the emergence of clear leaders in the DBMS market over the last five years, such that managers perceive less risk in which database engine they choose.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Simple Database
Because this is a practical book, it contains numerous examples. Rather than fabricating different sets of tables and columns for every chapter or section in the book, we have decided to draw from a single, simple schema for most examples. The subject area that we chose to model is a parts distributor, such as an auto-parts wholesaler or medical device distributor, in which the business fills customer orders for one or more parts that are supplied by external suppliers. Figure 1-1 shows the entity-relationship model for this business.
Figure 1-1: The parts distributor model
If you are unfamiliar with entity-relationship models, here is a brief description of how they work. Each box in the model represents an entity, which correlates to a database table. The lines between the entities represents the relationships between tables, which correlate to foreign keys. For example, the CUST_ORDER table holds a foreign key to the employee table, which signifies the salesperson responsible for a particular order. Physically, this means that the CUST_ORDER table contains a column holding employee ID numbers, and that, for any given order, the employee ID number indicates the employee who sold that order. If you find this confusing, simply use the diagram as an illustration of the tables and columns found within our database. As you work your way through the SQL examples in this book, return occasionally to the diagram, and you should find that the relationships start making sense.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
DML Statements
In this section, we introduce the four statements that comprise the DML portion of SQL. The information presented in this section should be enough to allow you to start writing DML statements. As is discussed at the end of the section, however, DML can look deceptively simple, so keep in mind while reading the section that there are many more facets to DML than are discussed here.
The SELECT statement is used to retrieve data from a database. The set of data retrieved via a SELECT statement is referred to as a result set . Like a table, a result set is comprised of rows and columns, making it possible to populate a table using the result set of a SELECT statement. The SELECT statement can be summarized as follows:
SELECT <one or more things>
FROM <one or more places>
WHERE <zero, one, or more conditions apply>
While the SELECT and FROM clauses are required, the WHERE clause is optional (although you will seldom see it omitted). We therefore begin with a simple example that retrieves three columns from every row of the customer table:
               SELECT cust_nbr, name, region_id
               FROM customer;

  CUST_NBR NAME                            REGION_ID
---------- ------------------------------ ----------
         1 Cooper Industries                       5
         2 Emblazon Corp.                          5
         3 Ditech Corp.                            5
         4 Flowtech Inc.                           5
         5 Gentech Industries                      5
         6 Spartan Industries                      6
         7 Wallace Labs                            6
         8 Zantech Inc.                            6
         9 Cardinal Technologies                   6
        10 Flowrite Corp.                          6
        11 Glaven Technologies                     7
        12 Johnson Labs                            7
        13 Kimball Corp.                           7
        14 Madden Industries                       7
        15 Turntech Inc.                           7
        16 Paulson Labs                            8
        17 Evans Supply Corp.                      8
        18 Spalding Medical Inc.                   8
        19 Kendall-Taylor Corp.                    8
        20 Malden Labs                             8
        21 Crimson Medical Inc.                    9
        22 Nichols Industries                      9
        23 Owens-Baxter Corp.                      9
        24 Jackson Medical Inc.                    9
        25 Worcester Technologies                  9
        26 Alpha Technologies                     10
        27 Phillips Labs                          10
        28 Jaztech Corp.                          10
        29 Madden-Taylor Inc.                     10
        30 Wallace Industries                     10
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: The WHERE Clause
Whether we are querying, modifying, or deleting data, the WHERE clause is the mechanism for identifying the sets of data we want to work with. In this chapter, we explore the role of the WHERE clause in SQL statements, as well as the various options available when building a WHERE clause.
Before we delve into the WHERE clause, let's imagine life without it. Say that you are interested in doing some maintenance on the data in the part table. In order to inspect the data in the table, you issue the following query:
SELECT part_nbr, name, supplier_id, status, inventory_qty
FROM part;
If the part table contains 10,000 items, the result set returned by the query would consist of 10,000 rows, each with 5 columns. You would then load the 10,000 rows into memory and make your modifications.
Once you have made the required modifications to your data in memory, it is time to apply the changes to the part table. Without the ability to specify the rows to modify, you have no choice but to delete all rows in the table and re-insert all 10,000 rows:
DELETE FROM part;

INSERT INTO part (part_nbr, name, supplier_id, status, inventory_qty)
VALUES ('XY5-1002', 'Wonder Widget', 1, 'IN-STOCK', 1);

/* 9,999 more INSERTs on the wall, 9,999 more INSERTS... */
While this approach works in theory, it wreaks havoc on performance, concurrency (the ability for more than one user to modify data simultaneously), and scalability.
Now imagine that you want to modify data in the part table only for those parts supplied by Acme Industries. Since the supplier's name is stored in the supplier table, you must include both the part and supplier tables in the FROM clause:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty,
  s.supplier_id, s.name
FROM part p, supplier s;
If 100 companies supply the 10,000 parts in the part table, this query will return 1,000,000 rows. Known as the Cartesian product, this number equates to every possible combination of all rows from the two tables. As you sift through the million rows, you would keep only those where the values of p.supplier_id and s.supplier_id are identical and where the s.name column matches 'Acme Industries'. If Acme Industries supplies only 50 of the 10,000 parts in your database, you will end up discarding 999,950 of the 1,000,000 rows returned by your query.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Life Without WHERE
Before we delve into the WHERE clause, let's imagine life without it. Say that you are interested in doing some maintenance on the data in the part table. In order to inspect the data in the table, you issue the following query:
SELECT part_nbr, name, supplier_id, status, inventory_qty
FROM part;
If the part table contains 10,000 items, the result set returned by the query would consist of 10,000 rows, each with 5 columns. You would then load the 10,000 rows into memory and make your modifications.
Once you have made the required modifications to your data in memory, it is time to apply the changes to the part table. Without the ability to specify the rows to modify, you have no choice but to delete all rows in the table and re-insert all 10,000 rows:
DELETE FROM part;

INSERT INTO part (part_nbr, name, supplier_id, status, inventory_qty)
VALUES ('XY5-1002', 'Wonder Widget', 1, 'IN-STOCK', 1);

/* 9,999 more INSERTs on the wall, 9,999 more INSERTS... */
While this approach works in theory, it wreaks havoc on performance, concurrency (the ability for more than one user to modify data simultaneously), and scalability.
Now imagine that you want to modify data in the part table only for those parts supplied by Acme Industries. Since the supplier's name is stored in the supplier table, you must include both the part and supplier tables in the FROM clause:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty,
  s.supplier_id, s.name
FROM part p, supplier s;
If 100 companies supply the 10,000 parts in the part table, this query will return 1,000,000 rows. Known as the Cartesian product, this number equates to every possible combination of all rows from the two tables. As you sift through the million rows, you would keep only those where the values of p.supplier_id and s.supplier_id are identical and where the s.name column matches 'Acme Industries'. If Acme Industries supplies only 50 of the 10,000 parts in your database, you will end up discarding 999,950 of the 1,000,000 rows returned by your query.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
WHERE to the Rescue
Hopefully, these scenarios give you some insight into the utility of the WHERE clause, including the ability to:
  1. Filter out unwanted data from a query's result set.
  2. Isolate one or more rows of a table for modification.
  3. Conditionally join two or more data sets together.
To see how these things are accomplished, let's add a WHERE clause to the previous SELECT statement, which strives to locate all parts supplied by Acme Industries:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty,
  s.supplier_id, s.name
FROM part p, supplier s
WHERE s.supplier_id = p.supplier_id
  AND s.name = 'Acme Industries';
The WHERE clause here is comprised of two parts, known as conditions, which are evaluated separately. Conditions always evaluate to either TRUE or FALSE; if there are multiple conditions in a WHERE clause, they all must evaluate to TRUE in order for a given row to be included in the result set. For this example, a row created by combining data from the part and supplier tables will only be included in the final result set if both tables share a common value for the supplier_id column, and if the value of the name column in the supplier tables matches 'Acme Industries'. Any other permutation of data from the two tables would evaluate to FALSE and be discarded.
With the addition of the WHERE clause to the previous example, therefore, Oracle will take on the work of discarding undesired rows from the result set, and only 50 rows will be returned by the query, rather than 1,000,000. Now that you have retrieved the 50 rows of interest from the database, you can begin the process of modifying the data. Keep in mind, however, that with the WHERE clause at your disposal you will no longer need to delete and re-insert your modified data; instead, you can use the UPDATE statement to modify specific rows based on the part_nbr column, which is the unique identifier for the table:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
WHERE Clause Evaluation
Now that we have seen the WHERE clause in action, let's take a look at how it is evaluated. As we mentioned, the WHERE clause consists of one or more conditions that evaluate independently to TRUE or FALSE. If your WHERE clause consists of multiple conditions, the conditions are separated by the logical operators AND and OR. Depending on the outcome of the individual conditions and the placement of these logical operators, Oracle will assign a final value of TRUE or FALSE to each candidate row, thereby determining whether a row will be included in the final result set.
Let's look at the 'Acme Industries' query again:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty,
  s.supplier_id, s.name
FROM part p, supplier s
WHERE s.supplier_id = p.supplier_id
  AND s.name = 'Acme Industries';
The WHERE clause consists of two conditions separated by AND. Thus, a row will only be included if both conditions evaluate to TRUE. Table 2-1 shows the possible scenarios when conditions are replaced by their possible outcomes.
Table 2-1: Multiple-condition evaluation using AND
Intermediate result
Final result
WHERE TRUE AND TRUE
TRUE
WHERE FALSE AND FALSE
FALSE
WHERE FALSE AND TRUE
FALSE
WHERE TRUE AND FALSE
FALSE
Using basic logic rules, we can see that the only combination of outcomes that results in a final value of TRUE being assigned to a candidate row is where both conditions evaluate to TRUE. Table 2-2 demonstrates the possible outcomes if our conditions had been separated by OR rather then AND.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conditions and Expressions
Now that we understand how conditions are grouped together and evaluated, let's look at the different elements that make up a condition. A condition is comprised of one or more expressions along with one or more operators. Examples of expressions include:
  • Numbers
  • Columns, such as s.supplier_id
  • Literals, such as 'Acme Industries'
  • Functions, such as UPPER('abcd')
  • Lists of simple expressions, such as (1, 2, 3)
  • Subqueries
Examples of operators include:
  • Arithmetic operators, such as +, -, *, and /
  • Comparison operators, such as =, <, >=, !=, LIKE, and IN
The following sections explore many of the common condition types that use different combinations of the above expression and operator types.
Most of the conditions that we use when constructing a WHERE clause will be equality conditions used to join data sets together or to isolate specific values. We have already encountered these types of conditions numerous times in previous examples, including:
s.supplier_id = p.supplier_id

s.name = 'Acme Industries'

supplier_id = (SELECT supplier_id 
  FROM supplier 
  WHERE name = 'Acme Industries')
In all three cases, we have a column expression followed by a comparison operator (=) followed by another expression. The conditions differ in the type of expression on the right side of the comparison operator. The first example compares one column to another, the second example compares a column to a literal, and the third example compares a column to the value returned by a subquery.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
WHERE to Go from Here
This chapter has introduced the role of the WHERE clause in different types of SQL statements as well as the various components used to build a WHERE clause. Because the WHERE clause plays such an important role in many SQL statements, however, the topic is far from exhausted. Additional coverage of WHERE clause topics may be found in:
  • Chapter 3, in which various flavors of join conditions are studied in detail
  • Chapter 5, which probes the different types of subqueries along with the appropriate operators for evaluating their results
  • Chapter 6, in which various methods of handling date/time data are explored
  • Chapter 14, which explores certain aspects of the WHERE clause from the standpoint of performance and efficiency
Additionally, here are a few tips to help you make the most of your WHERE clauses:
  1. Check your join conditions carefully. Make sure that each data set in the FROM clause is properly joined. Keep in mind that some joins require multiple conditions. See Chapter 3 for more information.
  2. Avoid unnecessary joins. Just because two data sets in your FROM clause contain the same column does not necessitate a join condition be added to your WHERE clause. In some designs, redundant data has been propagated to multiple tables through a process called denormalization. Take the time to understand the database design, and ask your DBA or database designer for a current data model.
  3. Use parentheses. Oracle maintains both operator precedence and condition precedence, meaning there are clearly defined rules for the order in which things will be evaluated, but the safest route for you and for those who will later maintain your code is to dictate evaluation order using parentheses. For operators, specifying
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Joins
Most of the things in life are not self-contained. There is not one shop where you will find all your requirements. This is valid for database tables as well. Quite often, you need information from more than one table. The SQL construct that combines data from two or more tables is called a join. This chapter takes you into the details of joins, their types, and their usage.
A join is a SQL query that extracts information from two or more tables or views. When you specify multiple tables or views in the FROM clause of a query, Oracle performs a join, linking rows from multiple tables together. There are several types of joins to be aware of:
Inner joins
Inner joins are the regular joins. An inner join returns the rows that satisfy the join condition. Each row returned by an inner join contains data from all tables involved in the join.
Outer joins
Outer joins are an extension to the inner joins. An outer join returns the rows that satisfy the join condition and also the rows from one table for which no corresponding rows (i.e., that satisfy the join condition) exist in the other table.
Self joins
A self join is a join of a table to itself.
The following sections discuss each of these joins with examples.
An inner join returns the rows that satisfy the join condition. Let's take an example to understand the concept of a join. Say you want to list the name and department name for each employee. To do this, you would use the following SQL statement:
            SELECT E.LNAME, D.NAME
            FROM EMPLOYEE E, DEPARTMENT D
            WHERE E.DEPT_ID = D.DEPT_ID;

LNAME      NAME
---------- --------------
SMITH      RESEARCH
ALLEN      SALES
WARD       SALES
JONES      RESEARCH
MARTIN     SALES
BLAKE      SALES
CLARK      ACCOUNTING
SCOTT      RESEARCH
KING       ACCOUNTING
TURNER     SALES
ADAMS      RESEARCH
JAMES      SALES
FORD       RESEARCH
MILLER     ACCOUNTING

14 rows selected.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Inner Joins
An inner join returns the rows that satisfy the join condition. Let's take an example to understand the concept of a join. Say you want to list the name and department name for each employee. To do this, you would use the following SQL statement:
            SELECT E.LNAME, D.NAME
            FROM EMPLOYEE E, DEPARTMENT D
            WHERE E.DEPT_ID = D.DEPT_ID;

LNAME      NAME
---------- --------------
SMITH      RESEARCH
ALLEN      SALES
WARD       SALES
JONES      RESEARCH
MARTIN     SALES
BLAKE      SALES
CLARK      ACCOUNTING
SCOTT      RESEARCH
KING       ACCOUNTING
TURNER     SALES
ADAMS      RESEARCH
JAMES      SALES
FORD       RESEARCH
MILLER     ACCOUNTING

14 rows selected.
This example queries two tables, because the employee name is stored in the EMPLOYEE table, whereas the department name is stored in the DEPARTMENT table. Notice that the FROM clause lists two tables EMPLOYEE and DEPARTMENT, separated by a comma ( , ). If you need to join three or more tables, you have to specify all the tables in the FROM clause separated by commas. The SELECT list may include columns from any of the tables specified in the FROM clause.
Note the use of table aliases in this query. It is common practice to use table aliases while selecting data from multiple tables. Whenever there is an ambiguity in the column names, you must use a table alias (or the table name) to qualify any ambiguous column names. For example, the column name DEPT_ID appears in both the tables. Therefore, the table aliases E and D are used in the WHERE clause to ask Oracle to equate DEPT_ID column from EMPLOYEE table with the DEPT_ID column from the DEPARTMENT table. Note that the table aliases have been used with the columns in the SELECT clause as well, even though the column names are unambiguous. It is good practice to use table aliases everywhere in a query if you are using them at all.
If you don't specify the join condition while joining two tables, Oracle combines each row from the first table with each row of the second table. This type of result set is called as a Cartesian product. The number of rows in a Cartesian product is the product of the number of rows in each table. Here's an example of a Cartesian product:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Outer Joins
Sometimes while performing a join between two tables, you need to return all the rows from one table even when there are no corresponding rows in the other table. Consider the following two tables, SUPPLIER and PART:
            SELECT * FROM SUPPLIER;

SUPPLIER_ID NAME
----------- ------------------------------
        101 Pacific Disks, Inc.
        102 Silicon Valley MicroChips
        103 Blue River Electronics

SELECT * FROM PART;

PART_NBR NAME               SUPPLIER_ID STATUS INVENTORY_QTY UNIT_COST RESUPPLY_DATE
-------- ------------------ ----------- ------ ------------- --------- -------------
HD211    20 GB Hard Disk            101 ACTIVE             5      2000 12-DEC-00
P3000    3000 MHz Processor         102 ACTIVE            12       600 03-NOV-00
If you want to list all the suppliers and all the parts supplied by them, it is natural to use the following query:
            SELECT S.SUPPLIER_ID, S.NAME SUPPLIER_NAME, P.PART_NBR, P.NAME PART_NAME
            FROM SUPPLIER S, PART P
            WHERE S.SUPPLIER_ID = P.SUPPLIER_ID;

SUPPLIER_ID SUPPLIER_NAME                  PART_NBR   PART_NAME
----------- ------------------------------ ---------- -------------------
        101 Pacific Disks, Inc.            HD211      20 GB Hard Disk
        102 Silicon Valley MicroChips      P3000      3000 MHz Processor
Note that even though we have three suppliers, this query lists only two of them, because the third supplier (Blue River Electronics) doesn't currently supply any part. When Oracle performs the join between SUPPLIER table and PART table, it matches the SUPPLIER_ID from these two tables (as specified by the join condition). Since SUPPLIER_ID 103 doesn't have any corresponding record in the PART table, that supplier is not included in the result set. This type of join is the most natural, and is known as an inner join.
The concept of the inner join is easier to understand in terms of the Cartesian product. While performing a join of SUPPLIER and PART tables, a Cartesian product is first formed (conceptually, Oracle doesn't physically materialize this Cartesian product), and then the conditions in the WHERE clause restrict the results to only those rows where the SUPPLIER_ID values match.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Self Joins
There are situations in which one row of a table is related to another row of the same table. The EMPLOYEE table is a good example. The manager of one employee is also an employee. The rows for both are in the same EMPLOYEE table. This relationship is indicated in the MANAGER_EMP_ID column:
CREATE TABLE EMPLOYEE (
EMP_ID          NUMBER (4) NOT NULL PRIMARY KEY,
FNAME           VARCHAR2 (15), 
LNAME           VARCHAR2 (15), 
DEPT_ID         NUMBER (2),
MANAGER_EMP_ID  NUMBER (4) REFERENCES EMPLOYEE(EMP_ID),
SALARY          NUMBER (7,2),
HIRE_DATE       DATE, 
JOB_ID          NUMBER (3));
To get information about an employee and his manager, you have to join the EMPLOYEE table with itself. This is achieved by specifying the EMPLOYEE table twice in the FROM clause and using two different table aliases, thereby treating EMPLOYEE as if it were two separate tables. The following example lists the name of each employee and his manager:
            SELECT E.LNAME EMPLOYEE, M.NAME MANAGER
            FROM EMPLOYEE E, EMPLOYEE M
            WHERE E.MANAGER_EMP_ID = M.EMP_ID;

EMPLOYEE   MANAGER
---------- ----------
SMITH      FORD
ALLEN      BLAKE
WARD       BLAKE
JONES      KING
MARTIN     BLAKE
BLAKE      KING
CLARK      KING
SCOTT      JONES
TURNER     BLAKE
ADAMS      SCOTT
JAMES      BLAKE
FORD       JONES
MILLER     CLARK

13 rows selected.
Notice the use of the EMPLOYEE table twice in the FROM clause with two different aliases. Also notice the join condition that reads as: "Where the employee's MANAGER_EMP_ID is the same as his manager's EMP_ID."
Even though the EMPLOYEE table has 14 rows, the previous query returned only 13 rows. This is because there is an employee without a MANAGER_EMP_ID. Oracle excludes this row from the result set while performing the self inner join. To include the employee(s) without a MANAGER_EMP_ID, you need an outer join:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Joins and Subqueries
Joins can sometimes be used to good advantage in reformulating SELECT statements that would otherwise contain subqueries. Consider the problem of obtaining a list of suppliers of parts for which your inventory has dropped below ten units. You might begin by writing a query such as the following:
SELECT supplier_id, name
FROM supplier s
WHERE EXISTS (SELECT *
              FROM part p
              WHERE p.inventory_qty < 10
                AND p.supplier_id = s.supplier_id);
The subquery in this SELECT statement is a correlated subquery, which means that it will be executed once for each row in the supplier table. Assuming that you have no indexes on the INVENTORY_QTY and SUPPLIER_ID columns of the PART table, this query could result in multiple, full-table scans of the PART table. It's possible to restate the query using a join, for example:
SELECT s.supplier_id, s.name
FROM supplier s, part p
WHERE p.supplier_id = s.supplier_id
  AND p.inventory_qty < 10;
Whether the join version or the subquery version of a query is more efficient depends on the specific situation. It may be worth your while to test both approaches to see which has a lower cost.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
DML Statements on a Join View
A join view is a view based on a join. Special considerations apply when you issue a DML (INSERT, UPDATE, or DELETE) statement against a join view. Ever thought about what happens when you insert a row into a join view—which table does the row go into? And what happens when you delete a row from a join view—which table does it gets deleted from? This section deals with these questions.
To be modifiable, a join view must not contain any of the following:
  • Hierarchical query clauses, such as START WITH or CONNECT BY
  • GROUP BY or HAVING clauses
  • Set operations, such as UNION, UNION ALL, INTERSECT, MINUS
  • Aggregate functions, such as AVG, COUNT, MAX, MIN, SUM, and so forth
  • The DISTINCT operator
  • The ROWNUM pseudocolumn
A DML statement on a join view can modify only one base table of the view. Apart from these rules, therefore, a join view must also have one key preserved table to be modified.
A key-preserved table is the most important requirement in order for a join view to be modifiable. In a join, a table is called a key-preserved table if its keys are preserved through the join—every key of the table can also be a key of the resultant join result set. Every primary key or unique key value in the base table must also be unique in the result set of the join. Let's take an example to understand the concept of key preserved tables better.
               DESC EMPLOYEE
 Name                            Null?    Type
 ------------------------------- -------- ----
 EMP_ID                          NOT NULL NUMBER(4)
 LNAME                                    VARCHAR2(15)
 FNAME                                    VARCHAR2(15)
 DEPT_ID                                  NUMBER(2)
 MANAGER_EMP_ID                           NUMBER(4)
 SALARY                                   NUMBER(7,2)
 HIRE_DATE                                DATE
 JOB_ID                                   NUMBER(3)

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
ANSI-Standard Join Syntax in Oracle9
Oracle9i introduced new join syntax that is compliant to the ANSI SQL standard defined for SQL/92. Prior to Oracle9i, Oracle supported the join syntax defined in the SQL/86 standard. In addition, Oracle supported outer joins through the proprietary outer join operator (+), discussed earlier in this chapter. The old join syntax and the proprietary outer join operator are still supported in Oracle9i. However, the ANSI standard join syntax introduces several new keywords and new ways to specify joins and join conditions.
With the traditional join syntax, you specify multiple tables in the FROM clause separated by commas, as in the following example:
SELECT L.LOCATION_ID, D.NAME, L.REGIONAL_GROUP
FROM DEPARTMENT D, LOCATION L
WHERE D.LOCATION_ID = L.LOCATION_ID;
With the new syntax in Oracle9i, you specify the join type with the JOIN keyword in the FROM clause. For example, to perform an inner join between tables DEPARTMENT and LOCATION, you specify:
FROM DEPARTMENT D INNER JOIN LOCATION L
In the traditional join syntax, the join condition is specified in the WHERE clause. With the new syntax in Oracle9i, the purpose of the WHERE clause is for filtering only. The join condition is separated from the WHERE clause and put in a new ON clause, which appears as part of the FROM clause. The join condition of the previous example will be specified using the new syntax as:
ON D.LOCATION_ID = L.LOCATION_ID;
The complete join, using the new syntax, will be:
SELECT L.LOCATION_ID, D.NAME, L.REGIONAL_GROUP
FROM DEPARTMENT D INNER JOIN LOCATION L
ON D.LOCATION_ID = L.LOCATION_ID;
Specifying the join condition is further simplified if:
  • You use equi-joins, and
  • The column names are identical in both the tables.
If these two conditions are satisfied, you can apply the new USING clause to specify the join condition. In the previous example, we used an equi-join. Also, the column involved in the join condition (LOCATION_ID) is named identically in both the tables. Therefore, this join condition can also be written as:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Group Operations
Group operations are quite common in the day-to-day life of a SQL programmer. If we use SQL to access a database, it is quite common to expect questions like:
  • What is the maximum salary in this department?
  • How many managers are there in each department?
  • What is the number of customers for each product?
  • Can we print the monthly aggregate sales for each region?
We need group operations to answer these questions. Oracle provides a rich set of features to handle group operations. These features include aggregate functions, the GROUP BY clause, the HAVING clause, the GROUPING function, and the extensions to the GROUP BY clause—ROLLUP and CUBE.
This chapter deals with simple group operations involving the aggregate functions, the GROUP BY and HAVING clauses. Advanced group operations such as GROUPING, ROLLUP, and CUBE are discussed in Chapter 12.
In essence, an aggregate function summarizes the results of an expression over a number of rows, returning a single value. The general syntax for most of the aggregate functions is as follows:
            aggregate_function([DISTINCT | ALL] expression)
The syntax elements are:
aggregate_function
Gives the name of the function, e.g., SUM, COUNT, AVG, MAX, MIN, etc.
DISTINCT
Specifies that the aggregate function should consider only distinct values of the argument expression.
ALL
Specifies that the aggregate function should consider all values, including all duplicate values, of the argument expression. The default is ALL.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Aggregate Functions
In essence, an aggregate function summarizes the results of an expression over a number of rows, returning a single value. The general syntax for most of the aggregate functions is as follows:
            aggregate_function([DISTINCT | ALL] expression)
The syntax elements are:
aggregate_function
Gives the name of the function, e.g., SUM, COUNT, AVG, MAX, MIN, etc.
DISTINCT
Specifies that the aggregate function should consider only distinct values of the argument expression.
ALL
Specifies that the aggregate function should consider all values, including all duplicate values, of the argument expression. The default is ALL.
expression
Specifies a column, or any other expression, on which we want to perform the aggregation.
Let's look at a simple example. The following SQL uses the MAX function to find the maximum salary of all employees:
            SELECT MAX(SALARY) FROM EMPLOYEE;

MAX(SALARY)
-----------
       5000
In subsequent sections, we use a series of slightly more involved examples that illustrate various aspects of aggregate function behavior. For those examples, we use the following CUST_ORDER table:
            DESC CUST_ORDER
 Name                             Null?    Type
 -------------------------------- -------- --------------
 ORDER_NBR                        NOT NULL NUMBER(7)
 CUST_NBR                         NOT NULL NUMBER(5)
 SALES_EMP_ID                     NOT NULL NUMBER(5)
 SALE_PRICE                                NUMBER(9,2)
 ORDER_DT                         NOT NULL DATE
 EXPECTED_SHIP_DT                 NOT NULL DATE
 CANCELLED_DT                              DATE
 SHIP_DT                                   DATE
 STATUS                                    VARCHAR2(20)

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The GROUP BY Clause
The GROUP BY clause, along with the aggregate functions, groups a result set into multiple groups, and then produces a single row of summary information for each group. For example, if we want to find the total number of orders for each customer, execute the following query:
            SELECT CUST_NBR, COUNT(ORDER_NBR) 
            FROM CUST_ORDER 
            GROUP BY CUST_NBR;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
       201                2
       231                6
       244                2
       255                6
       264                2
       288                2

6 rows selected.
The query produces one summary line of output for each customer. This is the essence of a GROUP BY query. We asked Oracle to GROUP the results BY CUST_NBR; therefore, it produced one output row for each distinct value of CUST_NBR. Each data value for a given customer represents a summary based on all rows for that customer.
The nonaggregate expression CUST_NBR in the SELECT list also appears in the GROUP BY clause. If we have a mix of aggregate and nonaggregate expressions in the SELECT list, SQL expects that we are trying to perform a GROUP BY operation, and we must also specify all nonaggregate expressions in the GROUP BY clause. SQL returns an error if we fail to do so. For example, if we omit the GROUP BY clause, the following error is returned:
            SELECT CUST_NBR, SALES_EMP_ID, COUNT(ORDER_NBR)
            FROM CUST_ORDER;
SELECT CUST_NBR, SALES_EMP_ID, COUNT(ORDER_NBR)
       *
ERROR at line 1:
ORA-00937: not a single-group group function
Similarly, if we forget to include all nonaggregate expressions from the SELECT list in the GROUP BY clause, SQL returns the following error:
            SELECT CUST_NBR, SALES_EMP_ID, COUNT(ORDER_NBR)
            FROM CUST_ORDER
            GROUP BY CUST_NBR;
SELECT CUST_NBR, SALES_EMP_ID, COUNT(ORDER_NBR)
                 *
ERROR at line 1:
ORA-00979: not a GROUP BY expression
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The HAVING Clause
The HAVING clause is closely associated with the GROUP BY clause. The HAVING clause is used to put a filter on the groups created by the GROUP BY clause. If a query has a HAVING clause along with a GROUP BY clause, the result set will include only the groups that satisfy the condition specified in the HAVING clause. Let's look at some examples that illustrate this. The following query returns the number of orders per customer:
            SELECT CUST_NBR, COUNT(ORDER_NBR) 
            FROM CUST_ORDER 
            GROUP BY CUST_NBR
            HAVING CUST_NBR < 260;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
       201                2
       231                6
       244                2
       255                6
Notice that the output only includes customers with numbers below 260. That's because the HAVING clause specified CUST_NBR < 260 as a condition. Orders for all customers were counted, but only those groups that matched the specified HAVING condition were returned as the result.
The previous example is a poor use of the HAVING clause, because that clause only references unsummarized data. It's more efficient to use WHERE CUST_NBR < 260 instead of HAVING CUST_NBR < 260, because the WHERE clause eliminates rows prior to summarization, whereas HAVING eliminates groups post-summarization. A better version of the previous query would be:
SELECT CUST_NBR, COUNT(ORDER_NBR) 
FROM CUST_ORDER 
WHERE CUST_NBR < 260;
The next example shows a more appropriate use of the HAVING clause:
            SELECT CUST_NBR, COUNT(ORDER_NBR) 
            FROM CUST_ORDER 
            GROUP BY CUST_NBR
            HAVING COUNT(ORDER_NBR) > 2;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
       231                6
       255                6
Note the use of an aggregate function in the HAVING clause. This is an appropriate use for HAVING, because the results of the aggregate function cannot be determined until after the grouping takes place.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: Subqueries
Some endeavors require a certain level of preparation before the main activity can commence. Cooking, for example, often involves pre-mixing sets of ingredients before they are combined. Similarly, certain types of SQL statements benefit from the creation of intermediate result sets to aid in statement execution. The structure responsible for generating intermediate result sets is the subquery. This chapter will define and illustrate the use of subqueries in SQL statements.
A subquery is a SELECT statement that is nested within another SQL statement. For the purpose of this discussion, we will call the SQL statement that contains a subquery the containing statement . Subqueries are executed prior to execution of the containing SQL statement (see Section 5.3 later in this chapter for the exception to this rule), and the result set generated by the subquery is discarded after the containing SQL statement has finished execution. Thus, a subquery can be thought of as a temporary table with statement scope.
Syntactically, subqueries are enclosed within parentheses. For example, the following SELECT statement contains a simple subquery in its WHERE clause:
SELECT * FROM customer 
WHERE cust_nbr = (SELECT 123 FROM dual);
The subquery in this statement is absurdly simple, and completely unnecessary, but it does serve to illustrate a point. When this statement is executed, the subquery is evaluated first. The result of that subquery then becomes a value in the WHERE clause expression:
SELECT * FROM customer 
WHERE cust_nbr = 123;
With the subquery out of the way, the containing query can now be evaluated. In this case, it would bring back information about customer number 123.
Subqueries are most often found in the WHERE clause of a SELECT, UPDATE, or DELETE statement. A subquery may either be correlated with its containing SQL statement, meaning that it references one or more columns from the containing statement, or it might reference nothing outside itself, in which case it is called a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Is a Subquery?
A subquery is a SELECT statement that is nested within another SQL statement. For the purpose of this discussion, we will call the SQL statement that contains a subquery the containing statement . Subqueries are executed prior to execution of the containing SQL statement (see Section 5.3 later in this chapter for the exception to this rule), and the result set generated by the subquery is discarded after the containing SQL statement has finished execution. Thus, a subquery can be thought of as a temporary table with statement scope.
Syntactically, subqueries are enclosed within parentheses. For example, the following SELECT statement contains a simple subquery in its WHERE clause:
SELECT * FROM customer 
WHERE cust_nbr = (SELECT 123 FROM dual);
The subquery in this statement is absurdly simple, and completely unnecessary, but it does serve to illustrate a point. When this statement is executed, the subquery is evaluated first. The result of that subquery then becomes a value in the WHERE clause expression:
SELECT * FROM customer 
WHERE cust_nbr = 123;
With the subquery out of the way, the containing query can now be evaluated. In this case, it would bring back information about customer number 123.
Subqueries are most often found in the WHERE clause of a SELECT, UPDATE, or DELETE statement. A subquery may either be correlated with its containing SQL statement, meaning that it references one or more columns from the containing statement, or it might reference nothing outside itself, in which case it is called a noncorrelated subquery. A less-commonly-used but powerful variety of subquery, called the inline view, occurs in the FROM clause of a select statement. Inline views are always noncorrelated; they are evaluated first and behave like unindexed tables cached in memory for the remainder of the query.
Subqueries are useful because they allow comparisons to be made without changing the size of the result set. For example, we might want to find all customers that placed orders last month, but we might not want any given customer to be included more than once, regardless of the number of orders placed by that customer. Whereas joining the customer and orders tables would expand the result set by the number of orders placed by each customer, a subquery against the orders table using the IN or EXISTS operator would determine whether each customer placed an order, without regard for the number of orders placed.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Noncorrelated Subqueries
Noncorrelated subqueries allow each row from the containing SQL statement to be compared to a set of values. Divide noncorrelated subqueries into the following three categories, depending on the number of rows and columns returned in their result set:
  • Single-row, single-column subqueries
  • Multiple-row, single-column subqueries
  • Multiple-column subqueries
Depending on the category, different sets of operators may be employed by the containing SQL statement to interact with the subquery.
A subquery that returns a single row with a single column is treated like a scalar by the containing statement; not surprisingly, these types of subqueries are known as scalar subqueries. The subquery may appear on either side of a condition, and the usual comparison operators (=, <, >, !=, <=, >=) are employed. The following query illustrates the utility of single-row, single-column subqueries by finding all employees earning an above-average salary. The subquery returns the average salary, and the containing query then returns all employees who earn more than that amount.
               SELECT lname 
               FROM employee
               WHERE salary > (SELECT AVG(salary) 
                               FROM EMPLOYEE);

LNAME
--------------------
Brown
Smith
Blake
Isaacs
Jacobs
King
Fox
Anderson
Nichols
Iverson
Peters
Russell
As this query demonstrates, it can be perfectly reasonable for a subquery to reference the same tables as the containing query. In fact, subqueries are frequently used to isolate a subset of records within a table. For example, many applications include maintenance routines that clean up operational data, such as exception or load logs. Every week, a script might delete all but the latest day's activity. For example:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Correlated Subqueries
A subquery that references one or more columns from its containing SQL statement is called a correlated subquery. Unlike noncorrelated subqueries, which are executed exactly once prior to execution of the containing statement, a correlated subquery is executed once for each candidate row in the intermediate result set of the containing query. For example, consider the following query, which locates all parts supplied by Acme Industries that have been purchased ten or more times since December:
SELECT p.part_nbr, p.name
FROM supplier s, part p
WHERE s.name = 'Acme Industries' 
  AND s.supplier_id = p.supplier_id
  AND 10 <= 
   (SELECT COUNT(*) 
    FROM cust_order co, line_item li
    WHERE li.part_nbr = p.part_nbr 
      AND li.order_nbr = co.order_nbr
      AND co.order_dt >= TO_DATE('01-DEC-2001','DD-MON-YYYY'));
The reference to p.part_nbr is what makes the subquery correlated; values for p.part_nbr must be supplied by the containing query before the subquery can execute. If there are 10,000 parts in the part table, but only 100 are supplied by Acme Industries, the subquery will be executed once for each of the 100 rows in the intermediate result set created by joining the part and supplier tables.
Correlated subqueries are often used to test whether relationships exist without regard to cardinality. We might, for example, want to find all parts that have shipped at least once in 2002. The EXISTS operator is used for these types of queries, as illustrated by the following query:
SELECT p.part_nbr, p.name, p.unit_cost
FROM part p
WHERE EXISTS 
 (SELECT 1 FROM line_item li, cust_order co
  WHERE li.part_nbr = p.part_nbr 
    AND li.order_nbr = co.order_nbr
    AND co.ship_dt >= TO_DATE('01-JAN-2002','DD-MON-YYYY'));
As long as the subquery returns one or more rows, the EXISTS condition is satisfied without regard for how many rows were actually returned by the subquery. Since the EXISTS operator returns TRUE or FALSE depending on the number of rows returned by the subquery, the actual columns returned by the subquery are irrelevant. The SELECT clause requires at least one column, however, so it is common practice to use either the literal "1" or the wildcard " * ".
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Inline Views
Content preview·