ChapterÂ 4.Â A Lightning SQL Tutorial

SQL is the language that you use to manipulate relational data. Other languages for data manipulation do exist, but SQL is the most common, and it's the reason for the "SQL" in "SQL*Plus." Because everything you do in SQL*Plus will involve SQL, it's worth digressing for a chapter to focus on learning the rudiments of the language.

Data Manipulation Statements

As manifested in Oracle, the SQL language encompasses several categories of statements: statements that work with data, other statements that define and modify the structures (such as tables and indexes) that hold data, and still other statements that control the operation of the database itself.

This chapter focuses on statements used to manipulate data:

INSERT: Places new records, or rows, into a database table.
SELECT: Retrieves previously inserted rows from a database table.
UPDATE: Modifies data in a table.
DELETE: Deletes data from a table.
MERGE: Brings a table up to date by modifying or inserting rows, as appropriate

The reason I focus on data manipulation in this chapter is that if you're just starting out with SQL, data manipulation is likely to be your first problem. You'll likely be working with tables and other database objects that have been created. Furthermore, data manipulation statements are the ones you'll need to leverage the SQL*Plus reporting capabilities I describe in the next few chapters. Other topics, such as creating database structures (e.g., tables and indexes) and controlling the operation of the database, are database administration topics that rightfully belong in other books.

Inserting Data into a Table

Use the INSERT statement to add a new row of data to a database table. The following form of INSERT should suffice for almost all your needs:

INSERT INTO table_name (column_list) VALUES (value_list);

Replace table_name with the name of the target table, the table to which you wish to add a new row of data. Replace column_list with a comma-delimited list of column names. These are the columns for which you will supply values in your value_list. For example, type the code from Example 4-1 into SQL*Plus to insert a new employee.

ExampleÂ 4-1.Â Inserting a new employee

INSERT INTO employee (employee_id, employee_name)
   VALUES ('114','Marusia Bohuslavka');

This example supplies values for only two of the five employee table columns. When you insert a row and omit a value for a column, that column takes on a default value specified by your DBA when creating the table. Often, the default value for a column is no value at all, or null. I'll talk about nulls later in this chapter. They're quite important to understand.

When you insert new rows, you must supply values for the following types of columns:

Primary key columns. These are mandatory. In the employee table, the employee_id column forms the primary key.
Columns defined as NOT NULL that have no default value defined.

If you omit a value for one of these types of column, you'll receive an error, as in Example 4-2.

ExampleÂ 4-2.Â Omitting a required column

                  INSERT INTO employee (employee_name) VALUES ('Hryhory Kytasty');
*
ERROR at line 1: 
ORA-01400: cannot insert NULL into ("GENNICK"."EMPLOYEE"."EMPLOYEE_ID")

The error message here is informing you that employee_id is a mandatory column. When you receive an error such as this, supply a value for the column mentioned in the error message and retry the insert.

Retrieving Data from a Table

What good would a database be if you couldn't get information back from it? Indeed, you'll find that you spend more time using SQL to get data out of the database than you spend on any of the other data manipulation operations. The SELECT statement, often referred to as a query, is what you need. Example 4-3 shows a simple query that retrieves all data from a table.

ExampleÂ 4-3.Â Retrieving all rows, all columns

SELECT * FROM employee;

The asterisk (*) in this example is shorthand for "all columns." You can enumerate the columns, as in Example 4-4.

ExampleÂ 4-4.Â Enumerating columns in the SELECT list

SELECT employee_id, employee_name, employee_hire_date,
       employee_termination_date, employee_billing_rate
FROM employee;

In a database of any size, you aren't likely to want to retrieve all rows from a table. Rather, you'll want to define some subset of rows that you are interested in. For that, use the WHERE clause. Example 4-5 uses a WHERE clause to restrict the query's results to the row inserted in the previous section.

ExampleÂ 4-5.Â Using a WHERE clause to constrain results

SELECT employee_id, employee_name
FROM employee
WHERE employee_id = 114;

The WHERE clause supports the same comparison operators that you'll find in just about any programming or scripting language:

=: Are two values the same?
!=, <>, ^=: Do two values differ?
<: Is one value less than another?
<=: Is one value less than or equal to another?
>: Is one value greater than another?
>=: Is one value greater than or equal to another?

These are the most basic and common operators. You'll see others in this book that I'll explain as I continue. For a rigorous trip through the WHERE clause, I heartily recommend Sanjay Mishra and Alan Beaulieu's book, Mastering Oracle SQL, Second Edition (O'Reilly).

Beware the Asterisk!

It's OK to use the asterisk when writing ad hoc queries, but think twice before using it in any query that gets embedded into a program or a script. That's because the results from SELECT * will change in the event that you or your DBA ever add a new column to the target table, or you re-create the table using a new column order. For a SQL*Plus reporting script, such a change will likely result in nothing more catastrophic than some mangled formatting as a result of an extra and unexpected column in the report. However, operations other than simple SELECTs may fail when you suddenly add a column to a result set. When in doubt, enumerate your columns.

Multiple conditions

You can write many conditions in a WHERE clause, which you can link together using AND and OR. You can use parentheses to clarify the order of evaluation. Example 4-6 uses OR to retrieve the following two groups of employees:

Employee number 114
Employees with names beginning with 'Myk', but only when those employees were hired during or after 2004

ExampleÂ 4-6.Â Using AND and OR

SELECT * 
FROM employee
WHERE (employee_id = 114)
   OR (employee_hire_date >= TO_DATE('1-Jan-2004','dd-mon-yyyy')
       AND REGEXP_LIKE(employee_name, '^Myk.*'));

Parentheses ensure that the two AND conditions are collectively treated as one condition with respect to the OR operation.

The TO_DATE function converts the string '1-Jan-2004' into a true DATE value, for comparison to the hire dates. The call to REGEXP_LIKE identifies those rows having names that begin with 'Myk'. The string '^Myk.*' is a regular expression.

Tip

Regular expression support is an exciting, new feature of Oracle Database 10g. Learn more about this new feature from the Oracle Regular Expression Pocket Reference (O'Reilly), which I coauthored with Peter Linsley.

Negating conditions

There is also the NOT operator, which you can use to negate a condition. For example, the two queries in Example 4-7 are equivalent.

ExampleÂ 4-7.Â Using the NOT operator

SELECT *
FROM employee
WHERE employee_id <> 114;

SELECT *
FROM employee
WHERE NOT employee_id = 114;

NOT can be handy when writing complex queries because sometimes it's easier to think in terms of those rows that you do not want in your result. You can write conditions to identify those rows you do not want, and you can use NOT to negate those conditions. Example 4-8 shows a slightly modified (NOTified?) version of the query from Example 4-6. This time, I want the statement to return all the rows not returned in Example 4-6. I could have modified each of the three comparisons individually, but that would require a fair bit of thought, and I might make a mistake. It's easier to wrap the entire original logic in parentheses and apply the NOT operator to the result.

ExampleÂ 4-8.Â Negating complex logic

SELECT * 
FROM employee
WHERE NOT ((employee_id = 114)
   OR (employee_hire_date >= TO_DATE('1-Jan-2004','dd-mon-yyyy')
       AND REGEXP_LIKE(employee_name, '^Myk.*')));

Using NOT as I've just done leaves me open to problems involving null values. This is an important topic I'll discuss later in "The Concept of Null."

Table aliases

When you're writing a query, it's often necessary to qualify column names by also specifying their table names. You do this using dot notation, as shown in Example 4-9. The employee. in front of each column name specifies that each column belongs to the employee table.

ExampleÂ 4-9.Â Qualifying column names with their table names

SELECT employee.employee_id, employee.employee_name
FROM employee;

It's cumbersome to retype long table names many times over. For this reason, SQL allows you to specify alternate names, known as table aliases, for the tables in your query. Specify an alias by placing it immediately after its table name, thereby separating the two names by whitespace, as shown in Example 4-10.

ExampleÂ 4-10.Â Specifying and using a table alias

SELECT e.employee_id, e.employee_name
FROM employee e;

Keep your aliases short, which is the whole idea. It's much easier to type e. than employee. in front of each column name.

You really don't need to worry about qualifying column names when selecting from only one table. Later in this chapter, you'll learn how to select from many tables at once. That's when qualifying column names becomes important. If you don't qualify column names when selecting from multiple tables, the database engine must expend extra resources sorting through which name goes with which table. That's usually a trivial performance hit, but it can add up if given enough users and statements. And woe be to you if two tables share a common column name because the database engine won't be able to determine your intent, and you'll get an error.

Column aliases

Just as you can specify aliases for tables, you can also specify aliases for columns. Many people specify column aliases just as they do table aliases, by placing the alias name immediately after the column name. Others are a bit more proper in their approach and use the AS keyword. Example 4-11 demonstrates both approaches.

ExampleÂ 4-11.Â Specifying column aliases

SELECT e.employee_id emp_num, e.employee_name "Employee Name",
       e.employee_billing_rate * 0.50 discounted_rate
FROM employee e;

SELECT e.employee_id AS emp_num, e.employee_name AS "Employee Name",
       e.employee_billing_rate * 0.50 AS discounted_rate
FROM employee e;

Column aliases are useful for reporting, because SQL*Plus picks them up and uses them as default column titles. Having "Employee Name" at the top of a column looks much better to a user than "employee_name". Column aliases are also useful in naming expressions in your SELECT list such as the one in Example 4-11 that discounts the billing rate. Otherwise, the expression itself will be the column name. In Chapter 5, you'll see that it's much easier to define formatting for an expression column when the name is one that you have specified rather than one that the database has generated for you.

INSERT . . . SELECT FROM

An interesting use of SELECT is as a data source for an INSERT statement. Suppose that you want to create a project named "X Overhead" for each existing project, replacing X with that project's name. For example, for "Corporate Web Site," you'd create a new project named "Corporate Web Site Overhead." You can do that using the following INSERT statement:

INSERT INTO PROJECT (project_id, project_name)
   SELECT project_id+8000,
          SUBSTR(project_name,1,31) || ' Overhead'
   FROM project;

The nested SELECT in this statement returns a set of rows, each row consisting of a project ID and a project name. Those rows feed into the INSERT statement, which inserts those rows back into the project table. The SUBSTR (for substring) function call clips the old project names at 31 characters, to ensure enough room for adding ' Overhead'. The newly inserted projects are all numbered above 8000.

In this example, the source and target tables are the same, but that's not necessary. The SELECT is independent of the INSERT. All that's necessary is that the SELECT returns the correct number of columns corresponding in type to those listed in the INSERT statement.

Updating Data with New Values

Things change. That seems to be the rule in our world today. Your data will change, too, and for that reason SQL provides the UPDATE statement. Use it to set new values for some or all columns in a row. The UPDATE in Example 4-12 changes the name and the budget for project #1005.

ExampleÂ 4-12.Â Changing values in a single row

UPDATE project
   SET project_name = 'Virtual Private Network',
       project_budget = 199999.95
WHERE project_id=1005;

The update in Example 4-12 updates only a single row. You can be certain of that, because project_id is the primary key of the project table, so there can be at most one row with ID 1005. You can use UPDATE to change values in many rows, by writing conditions in your WHERE clause to target those rows. Example 4-13 shows budgets being cut by 30% for all projects but the corporate web site.

ExampleÂ 4-13.Â Changing values in many rows

UPDATE project
   SET project_budget = project_budget * 0.70
WHERE NOT project_name = 'Corporate Web Site';

You'll notice that the new value for project_budget in Example 4-13 is the result of an expression; in Example 4-12, the new values are constants. When updating many rows, it rarely makes sense to apply the same value to all, so such updates should use expressions or subqueries to generate new values appropriate to each row. (See the Section 4.7 for more information on that approach.)

When you write an update, be sure you know whether it potentially affects more than one row. If the WHERE clause does not specify a single value for the table's primary key, or for a unique key column, then the update could potentially affect many rows. When an update affects many rows, give careful thought to how you will compute new values for those rows.

Tip

If you know for certain that you can uniquely identify a single row via a non-key field, go for it. However, you may want to issue a SELECT first just to be sure. Example 4-13 identified a project by name. Potentially, two projects can have the same name, but I knew that wasn't the case when I wrote the example. In the real world, sometimes you take advantage of what you know about your own data.

It's possible to issue an UPDATE that changes the value in a primary key, but such changes aren't often made, and you're better off avoiding them if you can. Database designers design databases in ways that minimize or eliminate the need to change primary key values. When you change the primary key of a row, you affect the referential integrity of your database. The change must be rippled through to any other rows that refer to the row you are changing. Alternately, the database must prevent you from changing a primary key value that is referenced by another row. Much depends here on how your database schema was designed. In a worst-case scenario, you might have to sort out the proper order for a whole series of updates to foreign-key columns that reference the primary-key value you wish to change.

Deleting Data from a Table

All things must come to an end, and that's true of data, too. Actually, with today's increasing hard-drive sizes, I'm not so sure that some don't plan to hold their data forever, but let's ignore that complication for now. Use the DELETE statement to get rid of rows you no longer need. Did you insert all those new projects by executing the code in the "INSERT . . . SELECT FROM" sidebar? Too bad. Now your boss has changed his mind. Isn't that annoying? Now it's your job to delete what you just created. Do that using the DELETE statement shown in Example 4-14.

ExampleÂ 4-14.Â Deleting rows from a table

DELETE FROM project
WHERE project_id > 8000;

DELETE is a fairly simple statement, but be careful to craft your WHERE clause so you only delete rows that you want to delete. I often write a SELECT statement using the same WHERE conditions and check the results from that before unleashing a DELETE.

Deleting All Rows from a Table

You wouldn't think it, but deleting data can actually be an expensive proposition. There's quite a bit of database overhead involved in a large delete. If you happen to be deleting all table rows, and you own the table or are a DBA having the TRUNCATE ANY TABLE privilege, you can use the TRUNCATE TABLE statement to good effect:

TRUNCATE TABLE table_name;

If you plan to reload close to the same amount of data, use the following variant, which maintains the disk space currently allocated to the table:

TRUNCATE TABLE table_name REUSE STORAGE;

For a table of any size, it's far faster to erase all the rows by truncating the table than by issuing a DELETE. This is because Oracle can accomplish the entire truncation by resetting an internal pointer known as the highwater mark. The trade-offs are that TRUNCATE commits any transaction you might have pending, and that you can't undo a TRUNCATE TABLE statement if you discover you've made a blunder after it executes. You can, however, undo a DELETE statement if you discover your mistake before committing the transaction. Read the section on "Transactions" for more on this.

If you are at all uncertain about whether to use TRUNCATE TABLE or DELETE, then use DELETE. You generally cannot issue TRUNCATE TABLE statements from within programs, at least not as easily as you can issue DELETE statements. In SQL*Plus scripts, either is equally easy to use.

Merging Data into a Table

New in Oracle9i Database, and in the 2003 ANSI/ISO SQL standard, the MERGE statement solves the age-old problem of needing to update a table or insert a new row depending on whether a corresponding row already exists. Suppose that you've created those overhead projects mentioned in the "INSERT . . . SELECT FROM" sidebar. You'll have used the statement shown in Example 4-15.

ExampleÂ 4-15.Â Creating overhead projects

INSERT INTO PROJECT (project_id, project_name)
   SELECT project_id+8000, 
          SUBSTR(project_name,1,31) || ' Overhead'
   FROM project;

Example 4-15 creates a new set of projects from an existing set of rows. Now, let's say you're faced with the task of periodically refreshing the list of overhead projects. At the time you do a refresh, you need to account for two possibilities:

For any new projects that have been added, you need to create new overhead projects.
For any existing projects, you need to migrate any name changes to their respective overhead project names.

There are different ways that you can go about solving this problem. One way is to periodically issue the MERGE statement shown in Example 4-16.

ExampleÂ 4-16.Â Bringing the list of overhead projects up to date

MERGE INTO project pdest
USING (SELECT project_id+8000 project_id, 
          SUBSTR(project_name,1,31) || ' Overhead' project_name
      FROM project
      WHERE project_id <= 8000) psource
ON (pdest.project_id = psource.project_id)
WHEN MATCHED THEN UPDATE
   SET pdest.project_name = psource.project_name
WHEN NOT MATCHED THEN INSERT
   (project_id, project_name)
   VALUES (psource.project_id, psource.project_name);

Here's a step-by-step walkthrough of this MERGE statement:

The INTO clause identifies the target table, in this case, project. An alias of pdest is specified to make it easier to distinguish between the two occurrences of the table in the one statement.
The USING clause specifies a SELECT statement to use in generating rows for the merge operation. This SELECT statement's WHERE clause excludes any overhead rows, which are rows having project_id values greater than 8000. The results of this SELECT statement are identified through the alias psource.
Each row from the SELECT is matched to its corresponding row in pdest via the condition given in the ON clause.
When a match occurs, which is to say that a row in pdest has a project_id matching that of a given row in psource, an update of the project_name is triggered.
When no match occurs (for a given row from psource, no corresponding row in pdest can be found) a new row is inserted into pdest.

MERGE is commonly used in data warehousing environments to periodically update reporting tables from operational data. Often, the source for rows feeding into a MERGE operation will be an external table, a type of table corresponding to an operating system file. There's more in Chapter 9 about using external tables to load data from files.

Get Oracle SQL*Plus: The Definitive Guide, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Oracle SQL*Plus: The Definitive Guide, 2nd Edition by Jonathan Gennick