Chapter 4. Controlling Execution Plans

Saying is one thing and doing is another.

Michel Eyquem de Montaigne Essays, Bk. II, Ch. 31

This chapter covers two classes of tuning techniques for controlling execution plans: universal techniques that work independently of your choice of database vendor, and techniques that are database-specific. Database-specific techniques are covered well by your own vendor’s documentation, so you might know them well already. In general, you need both types of techniques to get precisely the execution plan you want. Each of the vendor-specific sections stands alone, even repeating material from the other sections when applicable. Therefore, you can skip the vendor sections that you don’t need.

Much ado has been made over controlling execution plans, sometimes with elaborate tools. This chapter focuses on the simplest ways to control plans, with a strong emphasis on getting the types of plans you will need to optimize real-world SQL. I have found that when you know which execution plan you want in advance, getting it is easy and requires only simple tools.

Universal Techniques for Controlling Plans

This section describes a number of database-independent techniques you can use to control execution plans. The techniques are good for the following purposes:

  • Enabling use of the index you want

  • Preventing use of the wrong indexes

  • Enabling the join order you want

  • Preventing join orders you do not want

  • Choosing the order to execute outer queries and subqueries

  • Providing the cost-based optimizer with good data

  • Fooling the cost-based optimizer with bad data

These vendor-independent techniques often offer an alternative method to achieve ends you could also achieve with vendor-specific methods. When you have a choice, the vendor-specific methods are usually cleaner. However, some problems are solvable only by these universal techniques, which offer solutions that can sometimes work on SQL that is intended to run on multiple vendor databases.

Enabling Use of the Index You Want

To enable efficient use of an index, you need a reasonably selective condition on the leading column (or only column) of that index. The condition must also be expressed in a way that enables the database to establish a reasonably narrow index range for the index values. The ideal form this takes is:

SomeAlias.Leading_Indexed_Column=<Expression>

In less ideal cases, the comparison is with some range of values, using BETWEEN, LIKE, <, >, <=, or >=. These range comparisons also potentially enable use of the index, but the index range is likely to be larger and the resulting query therefore slower. If the index range is too large, the optimizer might conclude that the index is not worth using and choose another path to the data. When you combine equalities and range conditions for multicolumn indexes, you should prefer indexes that lead with the columns that have equality conditions and finish with columns that have range conditions. Note that the left side of the comparison simply names the column, with no function around the column, and no expression (such as addition) using the column. Use of a function, a type conversion, or an arithmetic expression on the side with the indexed column will generally disable use of that index.

Type conversions are a particularly subtle way that SQL sometimes disables use of an index. DB2 returns an error if you compare two expressions with incompatible types. SQL Server prefers to perform the implicit conversion on the side of the comparison that does not disable the index. Oracle implicitly converts character-type expressions to the type of the other side, even when this disables index use. For example, consider this expression:

P.Phone_Number=5551212

If Phone_Number were a character-type column, this would likely evaluate internally on Oracle and SQL Server as:

On Oracle:     TO_NUMBER(P.Phone_Number)=5551212
On SQL Server: P.Phone_Number=CAST(5551212 AS VARCHAR)

SQL Server preserves indexed access to the column. On Oracle, the implicit use of TO_NUMBER( ) disables use of the index just as surely as if you made the expression explicit. (The only real difference is that the problem is harder to find in the implicit form.) The same problem can plague index use for joins, as well as for single-table conditions. For example, consider the join:

P.Phone_Number=C.Contact_Number

If Contact_Number were a number type and Phone_Number were a character type, the implicit conversion on Oracle would prevent an index-driven nested-loops join from C to P. A join in the other direction would be unhindered.

The expression opposite the indexed column reference can be arbitrarily complex. However, it must not reference columns in the same alias with the indexed column. For example, consider the condition:

P.Phone_Number=P.Area_Code||'5551212'

The database cannot drive into the index on P.Phone_Number with this condition, because the database must reach alias P before it can evaluate the expression on the right side. This chicken-and-egg problem prevents identifying (with the index) the subset of the table that meets this condition until after the database examines the whole table.

The final way that SQL often disables index use is with conditions combined with OR. For example, consider the query:

SELECT ... 
FROM Order_Details D, ... 
WHERE ...
  AND (D.Order_ID=:1 or :1 IS NULL)
  AND ...

In this example, the database can reach Order_Details through an index on Order_ID if the bind variable :1 happens to be nonnull. But if :1 is bound to a null value, there is no restriction at all on Order_ID and thus no use for that index. Since the database cannot tell what :1 will be bound to when it parses the SQL and prepares the plan, it will find no good opportunity to use the index. In this case, the solution is to create a two-part plan, with each part optimized for one of the cases:

SELECT ... 
FROM Order_Details D, ... 
WHERE ...
  AND D.Order_ID=:1
  AND :1 IS NOT NULL
  AND ...
UNION ALL
SELECT ... 
FROM Order_Details D, ... 
WHERE ...
  AND :1 IS NULL
  AND ...

When you view the execution plan for this query, it shows both indexed access through the index on Order_Details(Order_ID) and full-table-scan access to Order_Details. This might appear to be the worst of both worlds, but you are saved by the conditions:

AND :1 IS NOT NULL
...
AND :1 IS NULL

These conditions make no reference at all to any data in the database, so the database can and does evaluate them before it even begins reading data for that half of the combined statement. Therefore, it never actually executes the full table scan when :1 is not null, and it never actually executes an indexed read (or any other part of the execution plan for the first half of the query) when :1 is null. This amounts to a method to branch your execution plan depending on conditions on the bind variables, the variables that determine which data is available to drive the query. The only catch is that you must ensure that the conditions on the bind variables are mutually exclusive, so that exactly one of the branches actually returns data. For example, if you have another bind variable to provide Customer_Name, you might put together a query like this:

SELECT ... 
FROM Order_Details D, Customers C, ... 
WHERE ...
  AND D.Order_ID=:1
  AND :1 IS NOT NULL
  AND (C.Customer_Name=:2 OR :2 IS NULL)
  AND ...
UNION ALL
SELECT ... 
FROM Order_Details D, Customers C, ... 
WHERE ...
  AND :1 IS NULL
  AND :2 IS NOT NULL
  AND C.Customer_Name=:2
  AND ...
UNION ALL
SELECT ... 
FROM Order_Details D, Customers C, ... 
WHERE ...
  AND :1 IS NULL
  AND :2 IS NULL
  AND ...

This could support a three-part plan, in which the database would:

  1. Drive into Orders on the index on Order_ID (your first choice), when possible.

  2. Otherwise, drive into Customers on the index on Customer_Name (your second choice) when it has no Order_ID specified but has a customer name.

  3. Otherwise, just get all the rows, probably beginning with a full table scan, when it has no selective conditions at all.

In any case, the conditions on the bind variables in the three parts are contrived to be mutually exclusive:

AND :1 IS NOT NULL
...
AND :1 IS NULL
AND :2 IS NOT NULL
...
AND :1 IS NULL
AND :2 IS NULL

Preventing Use of the Wrong Indexes

Join expressions are usually simple, usually between consistent types, and usually between numerical IDs. Conditions on the driving table are usually simple and compatible with index use. A more frequent problem than enabling use of the right index is preventing use of the wrong indexes. In many queries, there are multiple single-table conditions that are capable of reaching multiple indexes, but you want to use only a specific one of those indexes. Join conditions are usually expressed to allow index-driven joins in either direction, although only one of the possible join directions turns out to be optimal. Occasionally, you’ll prefer to disable use of an index on a join altogether, to force a hash or sort-merge join.

To disable use of an index, create the simplest possible expression around the indexed column reference. For example, you should prevent use of an index on Status_Code for the unselective condition on closed orders, as the number of closed orders will eclipse open orders as you do more and more business:

O.Status_Code='CL'

Since Status_Code is a character-type column, a simple expression to disable index use without changing the results would simply concatenate an empty string to the end of Status_Code:

On Oracle and DB2: O.Status_Code||''='CL'
On SQL Server:     O.Status_Code+''='CL'

For number-type columns, you can add 0:

O.Region_ID+0=137

All databases have some sort of function that evaluates to the first argument when the argument is not null and otherwise returns the second argument. On Oracle, the function is NVL( ). On SQL Server and DB2, it is COALESCE( ). If both arguments are the same column, the function always returns the same result as the bare column, regardless of the column type. Therefore, this makes a handy recipe to deactivate index use regardless of column type:

On Oracle:             NVL(O.Order_Date,O.Order_Date)=<Value>
On DB2 and SQL Server: COALESCE(O.Order_Date,O.Order_Date)=<Value>

In a join condition, a join that disables an indexed path to O.Region_ID (but not to R.Region_ID) could look like this:

O.Region_ID+0=R.Region_ID

Using the type-independent approach, this same join would look like this:

NVL(O.Region_ID,O.Region_ID)=R.Region_ID

Enabling the Join Order You Want

Apart from unintentionally disabled indexes, there are two issues that sometimes disable desired join orders:

  • Outer joins

  • Missing redundant join conditions

Outer joins

Consider an outer join query, in Oracle-style notation:

SELECT ... 
FROM Employees E, Locations L
WHERE E.Location_ID=L.Location_ID(+)

or in the newer, universal notation:

SELECT ... 
FROM Employees E LEFT OUTER JOIN Locations L
     ON E.Location_ID=L.Location_ID

This query requests employee records with their matching locations, when employees have locations; otherwise, null location data is used for employees that have no matching locations. Based on the request, it is clear that the query cannot effectively drive from Locations to Employees, since even employees without locations are needed. Consider a case in which this query is just a template to which an application adds conditions that depend on search criteria provided by an end user. If the end user wants to see employees for a particular location, the application might create this query:

SELECT ... 
FROM Employees E LEFT OUTER JOIN Locations L
     ON E.Location_ID=L.Location_ID
WHERE L.Description='Headquarters'

In the outer case of the join from Employees to Locations, L.Description will be assigned a generated value of null, and the condition on L.Description will be false. Only the inner case of the join will return rows that might meet the restriction on L.Description, so now it makes perfect sense to drive the query in the other join order, from Locations to Employees. However, the existence of the outer join often prevents automated optimizers from allowing this reversed order on the outer joins, so you need to make the join explicitly an inner join to get the reversed join direction:

SELECT ... 
FROM Employees E INNER JOIN Locations L
     ON E.Location_ID=L.Location_ID WHERE L.Description='Headquarters'

Missing redundant join conditions

Normally, between any number of tables, the join count is the number of tables minus one. For example, between three tables, you expect to find two joins. Occasionally, a query permits an extra, redundant join. For example, if you have an Addresses table that contains all addresses significant to the company, it might have a one-to-zero or one-to-one relationship with the earlier Locations table, which contains only locations owned by the company and which references Addresses through a matching primary key. In this case, you might find a query like the following:

SELECT ... 
FROM Employees E, Locations L, Addresses A
WHERE E.Location_ID=L.Location_ID
  AND E.Location_ID=A.Address_ID
  AND A.ZIP_Code=95628

By transitivity (if a=b and b=c, then a=c), you can deduce that the condition L.Location_ID=A.Address_ID must be true for all rows this query would return. However, that condition is not explicit in the query, and not all databases will deduce it and fill it in if it is left out. The best plan, in this case, will likely begin with all addresses within that ZIP Code and immediately join to Locations to discard all addresses except the one or two that correspond to company locations, before joining to Employees. Since that join order requires the missing join condition to support an indexed path from Addresses to Locations, you should make the missing join condition explicit:

SELECT ... 
FROM Employees E, Locations L, Addresses A
WHERE E.Location_ID=L.Location_ID
  AND E.Location_ID=A.Address_ID
  AND L.Location_ID=A.Address_ID
  AND A.ZIP_Code=95628

Since you do not want to follow the join from Addresses to Employees directly, you could also remove, if necessary, the redundant join condition E.Location_ID=A.Address_ID, to discourage that unwanted join operation.

Preventing Join Orders You Do Not Want

Forcing joins in the direction you want, using the earlier techniques for preventing use of the wrong indexes, will prevent many undesired join orders. What do you do when you want the database to follow a particular join direction eventually, but not too early in the execution plan? You cannot afford to disable an index, because you must use that index eventually, just not too early. Consider the following two joins, in which you want to start the query with reads of T1 and then join to T2 before joining to T3:

... AND T1.Key2_ID=T2.Key2_ID
AND T1.Key3_ID=T3.Key3_ID ...

Here, you want to follow nested loops into both T2 and T3, following indexes in the keys mentioned and reaching T2 before reaching T3. To postpone the join you want to happen later, make it depend (or at least to appear to depend) on data from the join that must happen earlier. Here is a solution:

... AND T1.Key2_ID=T2.Key2_ID
AND T1.Key3_ID+0*T2.Key2_ID=T3.Key3_ID ...

You and I know that the second version is logically equivalent to the first. However, the database just finds an expression on the left side of the second join that depends on both T1 and T2 (not recognizing that no value from T2 can change the result), so it won’t try to perform the join to T3 until after T2.

If necessary, you can string together joins like this to completely constrain a join order. For each join after the first, add a logically irrelevant component referencing one of the columns added in the preceding join to the join expression. For example, if you want to reach tables T1 through T5 in numerical order, you can use the following. Notice that the join condition for the T3 table uses the expression 0*T2.Key2_ID to force the join to T2 to occur first. Likewise, the join condition for the T4 table uses 0*T3.Key3_ID to force T3 to be joined first.

... AND T1.Key2_ID=T2.Key2_ID
AND T1.Key3_ID+0*T2.Key2_ID=T3.Key3_ID
AND T1.Key4_ID+0*T3.Key3_ID=T4.Key4_ID
AND T1.Key4_ID+0*T4.Key4_ID=T5.Key5_ID ...

I’ll apply this method to a concrete example. Consider the following SQL, adapted from Chapter 3:

SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
       M.First_Name, M.Last_Name, LM.Description
FROM Locations LE, Locations LM, Employees M, Employees E
WHERE E.Last_Name = 'Johnson'
  AND E.Manager_ID=M.Employee_ID
  AND E.Location_ID=LE.Location_ID
  AND M.Location_ID=LM.Location_ID
  AND LE.Description='Dallas'

Assume that you have an execution plan that drives from the index on the employee’s last name, but you find that the join to the employee’s location (alias LE) to discard employees at locations other than Dallas is unfortunately happening last, after the other joins (to M and LM). You should join to LE immediately from E, to minimize the number of rows you need to join to the other two tables. Starting from E, the join to LM is not immediately possible, so if you prevent the join to M before LE, you should get the join order you want. Here’s how:

SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
       M.First_Name, M.Last_Name, LM.Description
FROM Locations LE, Locations LM, Employees M, Employees E
WHERE E.Last_Name = 'Johnson'
  AND E.Manager_ID+0*LE.Location_ID=M.Employee_ID
  AND E.Location_ID=LE.Location_ID
  AND M.Location_ID=LM.Location_ID 
  AND LE.Description='Dallas'

The key here is that I’ve made the join to M dependent on the value from LE. The expression 0*LE.Location_ID forces the optimizer to join to LE before M. Because of the multiply-by-zero, the added expression has no effect on the results returned by the query.

Forcing Execution Order for Outer Queries and Subqueries

Most queries with subqueries can logically drive from either the outer query or the subquery. Depending on the selectivity of the subquery condition, either choice can be best. The choice generally arises for queries with EXISTS or IN conditions. You can always convert between an EXISTS condition on a correlated subquery and the equivalent IN condition on a noncorrelated subquery. For example, you can convert this:

SELECT ... 
FROM Departments D 
WHERE EXISTS (SELECT NULL FROM Employees E
                          WHERE E.Department_ID=D.Department_ID)

to this:

SELECT ... 
FROM Departments D
WHERE D.Department_ID IN (SELECT E.Department_ID FROM Employees E)

The first form implies that the database drives from the outer query to the subquery. For each row returned by the outer query, the database executes the join in the subquery. The second form implies that you begin with the list of distinct departments that have employees, as found in the noncorrelated subquery, and drive from that list into the matching list of such departments in the outer query. Sometimes, the database itself follows this implied join order, although some databases can make the conversion internally if their optimizer finds that the alternate order is better. To make your own SQL more readable and to make it work well regardless of whether your database can convert the forms internally, use the form that implies the order you want. To force that order even when the database could make the conversion, use the same join-direction-forcing technique used in Section 4.1.4. Thus, an EXISTS condition that forces the outer query to execute first would look like this:

SELECT ... 
FROM Departments D 
WHERE EXISTS (SELECT NULL FROM Employees E
                          WHERE E.Department_ID=D.Department_ID+0)

For the contrary order, an IN condition that forces the implied driving order from the subquery to the outer query would look like this:

SELECT ... 
FROM Departments D
WHERE D.Department_ID IN (SELECT E.Department_ID+0 FROM Employees E)

Tip

This latter order would be a bad idea, unless you found a strange case in which you had more departments than employees!

You can have several subqueries in which the database either must drive from the outer query to the subquery (such as NOT EXISTS subqueries) or should drive in that order. Such a case implies a choice of the order of execution of the subqueries. You can also have choices about whether to execute subqueries after completing the outer query, or at the first opportunity, as soon as the correlation join is possible, or at some point between these extremes.

The first tactic for controlling the order of subquery execution is simply to list the subqueries in order in the WHERE clause (i.e., the top subquery to be executed should be listed first). This is one of the few times when WHERE-clause order seems to matter.

Rarely, the database will execute a subquery sooner than you would like. The same tactic for postponing joins (described in Section 4.1.4) works for correlation joins, the joins in subqueries that correlate the subqueries to the outer queries. For example, consider this query:

SELECT ... 
FROM Orders O, Customers C, Regions R
WHERE O.Status_Code='OP'
  AND O.Customer_ID=C.Customer_ID
  AND C.Customer_Type_Code='GOV'
  AND C.Region_ID=R.Region_ID
  AND EXISTS (SELECT NULL 
              FROM Order_Details OD
              WHERE O.Order_ID=OD.Order_ID
                AND OD.Shipped_Flag='Y')

For this query you might find that the subquery runs as soon as you reach the driving Orders table, but you might wish to perform the join to Customers first, to discard nongovernmental orders, before you take the expense of the subquery execution. In this case, this would be the transformation to postpone the correlation join:

SELECT ... 
FROM Orders O, Customers C, Regions R
WHERE O.Status_Code='OP'
  AND O.Customer_ID=C.Customer_ID
  AND C.Customer_Type_Code='GOV'
  AND C.Region_ID=R.Region_ID
  AND EXISTS (SELECT NULL 
              FROM Order_Details OD
              WHERE O.Order_ID+0*C.Customer_ID=OD.Order_ID
                AND OD.Shipped_Flag='Y')

Notice the addition of +0*C.Customer_ID to the subquery’s WHERE clause. This ensures the join to Customers occurs first, before the subquery executes.

Providing the Cost-Based Optimizer with Good Data

On any cost-based optimizer (that is, for any query except one running on the Oracle rule-based optimizer, since only Oracle has a rule-based optimizer), the second most common source of poor execution plans (after missing indexes) is missing statistics on the tables, columns, and indexes involved in the query. In all, cost-based optimizers do a fairly good job of finding the best plan without help when they have good information to begin with. However, when they are missing information—for example, because a table or index has been rebuilt without regenerating statistics for that object—they tend to make terrible assumptions.

If you are running on any database except Oracle, or if you are on Oracle’s cost-based optimizer (as is most common and as Oracle recommends) and not forcing the rule-based optimizer, the first thing you should try if you are not getting the execution plan you want is to regenerate statistics on every table and index relevant to the query. Standard statistics will usually suffice to get reasonable execution plans.

Cost-based optimizers usually assume that data is uniformly distributed. For example, if the optimizer statistics show a table of 1,000,000 rows with 50,000 distinct values for some indexed foreign key, the database will optimize on the assumption that every value of that key will match exactly 20 rows. For most indexed columns, like foreign keys, this assumption of a uniform data distribution works well. However, some columns have highly skewed distributions, such as status, code, or type columns, or foreign keys to status or type tables. For example, consider this query:

SELECT ... FROM Orders WHERE Status_Code = 'OP'

There might only be three or four values of Status_Code across a 1,000,000-row Orders table, but if 'OP' means this is an open order, not yet fulfilled or cancelled, this condition is far more selective than the optimizer would expect based solely on the number of distinct values. If the column had an index, the optimizer might never use that index if it knew only the small number of distinct indexed values. However, on some databases, you can generate added statistics that let the database know not only the number of distinct values but also the distribution of those values, a necessary step when you have such highly skewed distributions.

Fooling the Cost-Based Optimizer with Incorrect Data

This last technique is dangerous, and I recommend it only as a last resort. Sometimes, you want to simulate a large database on a small, development database. If you can extrapolate (or, better, measure from an actual database) statistics that apply to a large database, you can manually modify the data-dictionary tables that store those statistics for the optimizer, to fool the optimizer into thinking it is working with a large database. The small database will have statistics that show large tables with many distinct values on most indexes. This is a handy way to see execution plans that will apply to production volumes when you have only a test database with toy data volumes. For such toy-sized databases, there is no great risk to this approach. On production databases, the optimizer will occasionally make better choices if it has the wrong data, usually if it has data that exaggerates the selectivity of desired indexes or that exaggerates the size of a table when a full table scan is undesirable.

Imagine reversing the logic the optimizer follows: ask “What would I need to believe about the tables and indexes of this query to find an alternative plan (the alternative that you, the human optimizer, want) much more attractive?” It is not hard to fool the optimizer into doing what you want rather than what it would choose on its own, if you lie to it about the statistics. However, on production systems, this is dangerous in several ways:

  • As soon as anyone regenerates statistics for the tables or indexes, the optimizer will revert to the original error, unless the manual statistics-tweak is reapplied. You will have to rigorously control statistics generation to prevent this.

  • As soon as the database optimizer improves—with the next release, perhaps—it is denied the chance to exploit those improvements with correct data.

  • Most importantly, every other query against the tables and indexes with false statistics is at risk and will potentially be harmed, just to help the one query you wanted to tune when you fudged the statistics.

I have never needed to play this card to get an adequately optimized plan on Oracle, SQL Server, or DB2, and I recommend you avoid it if possible.

Controlling Plans on Oracle

Oracle currently offers two completely different optimizers, the rule-based optimizer (RBO) and the cost-based optimizer (CBO), and the methods for tuning on each differ.

The RBO is Oracle’s original automated optimizer, back from the days of Oracle Version 6 and earlier. By rule-based, Oracle means that the optimizer uses only fixed properties of the tables, indexes, and SQL to guess an optimum execution plan from a set of simple rules of thumb (or heuristics) built into the automated optimizer. The RBO uses no data about the sizes of the tables or indexes, or about the distribution of data within those objects. It does use data on the fixed properties of the indexes: whether they are unique, which columns they cover, in which order, and how well those match up with the most selective-looking filter conditions and joins in the SQL. As tables grow and data distributions change, the RBO should go right on delivering the same plan indefinitely, as long as you don’t alter the indexes (for example, from unique to nonunique) or change the table structure (for example, from an ordinary table to a partitioned table). However, at some future time, perhaps even in Oracle Database 10g, Oracle will drop all support for the rule-based optimizer, and cost-based optimization will become your only choice.

Since Oracle7, the RBO has been even more stable than before, because Oracle chose to freeze the RBO code beginning with Oracle7, except for rare, slight changes necessary to deliver functionally correct (as opposed to necessarily optimum) results. Therefore, an execution plan that is correct on the RBO today will likely stay unchanged until Oracle drops the RBO altogether. This is appealing from the perspective of stability, although the dark side of this stability is that the execution plans never get any better either.

Execution plans on the RBO never change to adapt to changing data distributions, and this is often cited as an argument to switch to the CBO. However, in my own experience, data-distribution change is the least of the reasons for cost-based optimization. In over 10 years, I have yet to find a single case in which it was important to use different execution plans for different real-world data distributions with the same SQL.

Tip

I have seen many cases in which one plan is not perfectly optimal for all real-world data distributions, but in all these cases, one robust plan exists that is at least nearly optimal across the board.

Another argument cited in favor of the CBO is that it can deliver parallel execution plans, plans that can bring multiple processors to bear on the SQL statement at once. I have not found this to be a compelling argument, since I have yet to find a real-world case in which the optimum SQL, with the optimum database design, required parallel execution for adequate performance. I expect some such cases exist in data-warehousing environments, which are not where most of my experience lies, I admit, but almost all cases in which parallel execution plans appear to shine are really covering up some mistake in database design, indexing, or application design, compensating for design deficiencies with horsepower. That, by itself, would not be such a bad thing; extra horsepower might be cheaper than fixing the application. However, parallel plans are usually in service of large batch processes, competing heavily for resources with online processes that are more critical to end users. Therefore, parallel plans often rob needed resources from other processes that are more critical.

These are the strongest arguments against using the RBO:

  • It will become unavailable in some future release, perhaps during Oracle Database 10g, and you will not be able to use an older release forever.

  • The CBO keeps getting better, while the RBO is stuck with all the old problems it has ever had.

  • The CBO has a huge inherent advantage in the information available to it to calculate the best plan.

  • The RBO cannot take advantage of features created since the CBO appeared in Oracle7, and in most cases the RBO will simply push queries that involve newer object types, such as bit-mapped indexes, off to the CBO. (See the following section, Section 4.2.1, for details about which features the RBO cannot handle.)

That said, the RBO does a surprisingly good job; its heuristics are well designed to get along with the tiny amount of information that the RBO uses to guess the best plan. In Chapter 6, I will describe properties of what I call a robust execution plan, one that behaves well across a wide range of data distributions. The RBO almost always delivers a robust plan when the necessary indexes are available and when the developer has not prevented use of an index with some index-disabling expression, as discussed earlier in this chapter. Given the right indexes, you can almost always get the best robust plan on either optimizer, with manual tuning. With automated tuning, the biggest advantage of the CBO is that it is more resourceful when dealing with imperfect indexing and nonoptimally written SQL; more often, it delivers at least an adequate plan in these cases, without manual tuning. When more than one robust plan is possible, the CBO is also more likely to find the best robust plan, while the RBO will pick one without knowing relative costs, unless you manually tune the SQL .

Controlling the Choice of Oracle Optimizer

It is unrealistic to optimize Oracle queries simultaneously for both the rule-based and the cost-based optimizers. Therefore, you should understand the factors that lead Oracle to choose which optimizer it applies, so that you can control those factors and get the optimizer you choose.

The RBO cannot handle certain object types and object properties that did not yet exist when Oracle froze the RBO code. However, rather than simply have its code error out, Oracle modified the RBO code just enough to let it recognize the cases it cannot handle and to have it pass those cases on to the CBO. Thus, even if you think you have set up your system for rule-based optimization, the following circumstances will absolutely force cost-based optimization:

  • Bit-mapped indexes on any column of a table referenced in the SQL, even if those indexes are on columns the SQL does not touch.

  • Function-based indexes in a table referenced in the SQL, if such an index is on an expression the SQL references.

  • Partitioned tables touched by the SQL.

  • Tables or indexes configured with parallel degree. The optimizer interprets these as a command to find parallel execution plans, which the RBO does not know how to do. As for bit-mapped indexes, indexes configured with parallel degree will disable use of the RBO on a table referenced by your SQL, even if the parallel-degree index is on columns the SQL does not touch.

If the tables and indexes involved in your SQL do not prevent using the RBO, Oracle chooses between the RBO and the CBO as follows:

  1. If any SELECT keyword in the SQL (even in a subquery or a view definition) is followed by any valid hint other than /*+ RULE */ or /*+ CHOOSE */, Oracle will use the CBO.

  2. Otherwise, if any SELECT keyword in the SQL (even in a subquery or a view definition) is followed by /*+ CHOOSE */ and there are any statistics on any table or index referenced by the SQL, Oracle will choose the CBO.

  3. Otherwise, if any SELECT keyword in the SQL (even in a subquery or a view definition) is followed by /*+ RULE */, Oracle will choose the RBO.

  4. Otherwise, if the session optimizer_mode parameter is set at the session level (by ALTER SESSION SET OPTIMIZER_MODE= <Your_Choice> ;), Oracle will choose according to that session-level parameter.

  5. Otherwise, if the optimizer_mode parameter is set for the database instance, in the init.ora file, Oracle will choose according to that instance-level parameter.

  6. Otherwise, Oracle will choose according to the ultimate default optimizer_mode parameter, CHOOSE.

In the last three steps of this decision cascade, Oracle chooses according to an optimizer_mode parameter, which you or your DBA sets. These are the four possible parameter values and how they affect the choice:

RULE

Oracle uses rule-based optimization.

ALL_ROWS

Oracle uses cost-based optimization with the goal of minimizing the cost of the whole query. This default version of cost-based optimization sometimes results in nonrobust plans (plans that use join methods other than nested-loops), with risks described in Chapter 6. However, the optimizer chooses these plans only when it calculates that they are faster than the best robust plans.

FIRST_ROWS

Oracle uses cost-based optimization with the goal of minimizing the cost of reaching the first rows from the query. In practice, this tends to favor robust, nested-loops plans similar to those plans the rule-based optimizer favors but built with much more knowledge of the data distributions and probable execution costs. The FIRST_ROWS optimization level creates the same effect as the OPTIMIZE FOR 1 ROW hint on DB2 and the OPTION(FAST 1) hint on SQL Server.

CHOOSE

Oracle uses cost-based optimization, as for the ALL_ROWS goal, unless no table or index involved in the query has optimization statistics available, in which case Oracle uses rule-based optimization.

Here’s a quick way to check the instance-level parameter for optimizer_mode:

SELECT VALUE FROM V$PARAMETER WHERE NAME = 'optimizer_mode';

When you have an execution plan in PLAN_TABLE, a quick way to see whether it is cost-based is to run the following query:

SELECT POSITION FROM PLAN_TABLE WHERE ID=0;

This returns the cost of the entire execution plan, in arbitrary units, when the plan is cost-based. When cost is not null, you have a cost-based plan.

Controlling Oracle Rule-Based Execution Plans

Most of the methods for controlling rule-based execution plans are the universal techniques of controlling plans, covered in the first section of this chapter. The primary Oracle-specific method of tuning under a rule-based default optimizer_mode is simply to switch modes to cost-based optimization, usually with a hint such as /*+ FIRST_ROWS */. In other words, you can always control a plan via hints, and hints (with the exception of the /*+ RULE */ hint) in a statement cause Oracle to use the CBO for that statement.

However, if you prefer not to use cost-based optimization, thus precluding the use of hints, one RBO-specific technique remains: in the FROM clause, list tables and their aliases in exactly the reverse order from the join order you want. This usually gives enough control of the join order, without using the techniques described earlier in Section 4.1.4. In particular, eligible, unique joins toward primary keys happen in the reverse order they are listed in the FROM clause, without changing the join conditions. For example, consider this query:

SELECT /*+ RULE */ E.First_Name, E.Last_Name, E.Salary, LE.Description,
       M.First_Name, M.Last_Name, LM.Description
FROM Locations LM, Employees M, Locations LE, Employees E
WHERE E.Last_Name = 'Johnson'
  AND E.Manager_ID=M.Employee_ID
  AND E.Location_ID=LE.Location_ID
  AND M.Location_ID=LM.Location_ID 
  AND LE.Description='Dallas';

Unlike the earlier version of this query in Chapter 3, which had the wrong order in the FROM clause, you now get the correct join order. In this correct execution plan, E joins to LE before joining to M or LM, as shown by the plan output:

SQL>@ex

PLAN
----------------------------------------------------------------------
SELECT STATEMENT
  NESTED LOOPS
    NESTED LOOPS
      NESTED LOOPS
        TABLE ACCESS BY INDEX ROWID 4*EMPLOYEES
          INDEX RANGE SCAN EMPLOYEE_LAST_NAME
        TABLE ACCESS BY INDEX ROWID 3*LOCATIONS
          INDEX UNIQUE SCAN LOCATION_PKEY
      TABLE ACCESS BY INDEX ROWID 2*EMPLOYEES
        INDEX UNIQUE SCAN EMPLOYEE_PKEY
    TABLE ACCESS BY INDEX ROWID 1*LOCATIONS
      INDEX UNIQUE SCAN LOCATION_PKEY

When the RBO otherwise has no preference based on the conditions and indexes, the RBO joins tables by working from right to left in the FROM clause. However, this method offers only limited control by itself, because the RBO follows its other rules of thumb before considering the join order in the FROM clause. For example, the RBO always chooses to perform unique indexed reads and joins before doing indexed range scans, when it can.

Controlling Oracle Cost-Based Execution Plans

There are two main parts involved in tuning on the Oracle CBO:

  • Providing the optimizer with good statistics about the tables and indexes, so it can calculate the costs of alternatives accurately. This is effectively a prerequisite to any manual tuning on a CBO.

  • Adding hints to queries that the CBO fails to optimize well even with complete statistics about the tables and indexes that the queries reference.

Oracle cost-based optimizer prerequisites

Proving that a little knowledge is a dangerous thing, cost-based optimizers often do a terrible job if they do not have statistics on all the tables and indexes involved in the query. It is therefore imperative to maintain statistics on tables and indexes reliably, including regenerating statistics whenever table volumes change much or tables or indexes are rebuilt. It is safest to regenerate statistics periodically, during times that load is relatively quiet, such as nightly or at least weekly. The best way to generate and update statistics is with Oracle’s DBMS_STATS package, documented at length in Oracle8i Supplied PL/SQL Packages Reference and Oracle9i Supplied PL/SQL Packages and Types Reference. Here is a simple example of using DBMS_STATS to generate statistics for a whole schema, Appl_Prod, sampling 10% of the data in the larger tables and cascading statistics collection to the indexes:

BEGIN
   DBMS_STATS.GATHER_SCHEMA_STATS ('Appl_Prod',10, 
      CASCADE => TRUE); 
END;
/

Often, queries include conditions on highly skewed distributions, such as conditions on special types, codes, or flags, when these columns have only a few values. Normally, the CBO evaluates selectivity of a condition based on the assumption that all nonnull values of a column are equally selective. This assumption generally works well for foreign and primary keys that join business entities, but it is inaccurate when the columns have permanent special meanings and certain meanings apply much more rarely than others.

For example, in an Orders table, you might have a Status_Code column with three possible values: 'CL' for closed (i.e., fulfilled) orders, 'CA' for cancelled orders, and 'OP' for open orders. Most orders, by far, would be fulfilled, once the application has been running for a few months. A steady, significant fraction of orders would end up cancelled, so that value would also eventually point to a large list of orders. However, as long as the business keeps up with incoming orders, the number of open orders would remain moderate and steady, even as data accumulates for years. Quite early, a condition specifying Status_Code='OP' would be selective enough to justify indexed access, if you had an index with that leading column, and it is important to enable the optimizer to realize this fact, preferably without a lot of manual tuning. Enabling the CBO to recognize when a column is selective requires two things:

  • The SQL must mention the specific selective value, rather than use a bind variable, prior to Oracle 9i Database. Use of bind variables is commonly attractive, since it makes SQL more general and easier to share between processes. However, this need to hardcode especially selective values is the exception to that rule. If you use Status_Code=:1 instead of Status_Code='OP', prior to Oracle 9i, you will deny the CBO potential knowledge of the selectivity of the condition at parse time, when it does not yet know whether the bind variable :1 will be assigned a common or a rare Status_Code. Fortunately, in these cases, the usual reason to prefer using bind variables does not generally apply; since these special codes have special business meanings, it is unlikely that the SQL ever requires substituting a different value than the single selective value.

Tip

In Oracle 9i, Oracle introduced bind variable peeking, wherein Oracle checks the first value assigned to each bind variable (when a query sees its first hard parse) when choosing an execution plan. This eliminates the need to specify fixed values in place of bind variables, as long as all the values to be bound are of similar selectivity. However, if the bind variable will be bound sometimes to selective values and sometimes to nonselective values, you still must hardcode the values to obtain different plans in the two cases.

  • You need to provide the CBO with special statistics that quantify how rare the uncommon code, type, or status values are, so it can know which values are highly selective.

Oracle stores special statistics on distribution when you request them, based on sorting the rows for a column and arranging the sorted list into a specified number of buckets that each contain the same number of rows. Since Oracle already knows that the range each bucket holds has the same number of rows, Oracle needs to know only the value-range endpoints in each bucket. In the current example, with 20 buckets, the first bucket might hold the range 'CA' to 'CA', and the second bucket might hold the range 'CA' to 'CL‘. The next 17 buckets would hold the most common range, 'CL' to 'CL‘. The last bucket would hold the range 'CL' to 'OP', which includes the rarest value. From this, Oracle can deduce that the selectivity of the column is 5-10% for the value 'CA', 85-95% for the value 'CL', and 0-5% for the value 'OP‘. Since you want the optimizer to know more closely how selective the 'OP' value is, you would choose more buckets than this, perhaps the maximum of 254. (Oracle compresses the bucket information when so few values apply, so the large number of buckets should be inexpensive.) To create 254 buckets for the example case, in the schema owned by Appl_Prod, use this:

BEGIN
   DBMS_STATS.GATHER_TABLE_STATS ('Appl_Prod','Orders', 
      METHOD_OPT => 'FOR COLUMNS SIZE 254 Status_Code'); 
END;
/

Generate the histogram statistics after you generate the general table statistics, because table-statistics generation deletes earlier histogram statistics.

General hint syntax

Oracle uses hints for manual control of cost-based optimization. Syntactically, these hints take the form of comments, like /*+ <Hint_String> */, immediately following the SELECT keyword. Oracle recognizes that this syntax encloses a hint, not a comment, by the + at the beginning and by the location of the hint, which must immediately follow SELECT. However, since these are comments from the point of view of standard SQL syntax, they do not interfere with parsing the SQL if the SQL is also to be executed on non-Oracle databases.

Tip

Oracle hints don’t help get a fast execution plan on non-Oracle databases either, but unfortunately, it is not currently possible to share manually tuned SQL on multiple vendor databases and have the manual tuning work uniformly well on them all.

Each hint directly affects only the SELECT block that has the comment. Thus, to control the order of joins and index choices within a subquery, place the hint after the SELECT keyword that begins the subquery. But to affect the outer-query order of joins and index choices, place a hint immediately after the outer-query SELECT.

Approaches to tuning with hints

There are two basic extremes involved in tuning with hints:

  • Use as little direction as possible to get the execution plan you want, or at least to get close enough to the plan you want for reasonable performance. This approach reasons that the CBO has more information than you have and should be left free to adapt to changing data distributions and to take advantage of improvements in Oracle with future releases. By leaving the CBO the maximum degrees of freedom, you maximize its power to optimize well for you in the future. However, until you try, you won’t know how much direction the CBO will need if it did not get the plan right in the first place, so this approach is likely to be iterative, involving the addition of one hint at a time until the CBO delivers a good plan.

  • If you did not get the plan you wanted from the CBO automatically, assume the CBO has made bad assumptions that will propagate to distort all of its calculations. Therefore, leave it with little freedom, specifying essentially the whole plan you want.

If you are confident in your chosen execution plan, as you should be if you apply the methods I describe later in this book, there is little reason to hold back from fully specifying that plan. I have yet to find a case where a well-chosen, robust execution plan needed to evolve to handle new data distributions or new database features. On the other hand, it is easy for SQL with a partially restricting set of hints to go wrong, especially if some table or index loses its statistics. When the CBO chooses incorrectly, the error that made the CBO choose incorrectly will likely propagate over the entire plan. For example, consider this query:

SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
       M.First_Name, M.Last_Name, LM.Description
FROM Locations LM, Employees M, Locations LE, Employees E
WHERE E.Hire_Date > :1
  AND E.Manager_ID=M.Employee_ID
  AND E.Location_ID=LE.Location_ID
  AND M.Location_ID=LM.Location_ID

At parse time, when the optimizer does its work, it cannot know that the bind variable :1 will likely be set to a value in the current week, so it makes a conservative assumption about the selectivity of that condition on Hire_Date. Having made that assumption, it might not only forego using an index on Hire_Date (depending on the data distribution), but it might also further calculate that it will hit most of the rows of all the joined tables as well, and the CBO might choose full table scans with hash joins on them. Even if you instruct the CBO to use the index on Hire_Date, it still retains its initial assumption that the driving condition is unselective, and will likely retain its poor choices for the other joins and table-access methods. This is really no flaw in the optimizer; it cannot know what the application developer knows about the likely values to be assigned to the bind variable. However, the consequence is that, if you need to be any more specific than just specifying ALL_ROWS or FIRST_ROWS, chances are relatively high that the optimizer will need help across the board, to correct for some incorrect assumption somewhere.

Tip

ALL_ROWS and FIRST_ROWS hints are a safe way to begin optimization. If you are using the rule-based optimizer, you can safely try out these cost-based approaches with a hint, even before you do the work of finding the best execution plan. If the result is already fast enough, you might save yourself any further work. If your optimization is already cost-based, under either the ALL_ROWS or FIRST_ROWS mode, try the other one. If an optimizer_mode hint alone solves your problem, the optimizer is making reasonable assumptions and you can trust it.

Table-access hints

These are the main hints to control table-access methods:

INDEX( <Alias_Name> <Index_Name> )

This directs Oracle, when possible, to access the alias <Alias_Name> using the index named <Index_Name>. Repeat this hint for each index/alias combination you need to control.

FULL( <Alias_Name> )

This directs Oracle, when possible, to access the alias <Alias_Name> using a full table scan. Repeat this hint for each full table scan you require.

INDEX_DESC( <Alias_Name> <Index_Name> )

This directs Oracle, when possible, to access the alias <Alias_Name> using the index named <Index_Name>, reaching the rows in descending order (the reverse of the normal index-sorted order). Repeat this hint for each index/alias combination you need to control, although it is unlikely you will need it more than once in a query.

The INDEX and FULL hints are common and easy to use. The INDEX_DESC hint is useful only rarely, but it is occasionally vital to use. For example, if you want to know all about the last employee hired in April, you might use this query:

SELECT * 
FROM Employees E
WHERE Hire_Date>=TO_DATE('2003-04-01','YYYY-MM-DD')
  AND Hire_Date< TO_DATE('2003-05-01','YYYY-MM-DD')
ORDER BY Hire_Date DESC

You’ll find the most recently hired employee you want at the top of the list of rows returned by this query. To avoid reading all the data for other employees hired in April, you might think to add a condition AND ROWNUM=1 to the query. However, this sometimes will not yield the desired result, because (depending on the data) Oracle will sometimes apply that condition before performing the descending sort. If Oracle uses a full table scan, it will return the first employee hired in April it finds in the table, likely the least recently hired. If it uses a simple index range scan on an index on Hire_Date, it will begin, as range scans generally do by default, at the low end of the index range, returning the first employee hired in April. However, the INDEX_DESC hint, with the index Employee_Hire_Date on the Hire_Date column, neatly solves the problem, returning the desired row with just a single logical I/O to the table:

SELECT /*+ INDEX_DESC(E Employee_Hire_Date) */ * 
FROM Employees E
WHERE Hire_Date>=TO_DATE('2003-04-01','YYYY-MM-DD')
  AND Hire_Date< TO_DATE('2003-05-01','YYYY-MM-DD')
  AND ROWNUM=1

Note that I removed the explicit ORDER BY clause, since it gives the false impression that it has effect, given the condition on ROWNUM.

Warning

The preceding example might strike you as risky code, apt to break functionally, for example, if someone drops or renames the index used. It is risky, and I recommend it only if the value of the performance improvement exceeds the cost of the risk of incorrect results. This is a clear case for SQL syntax that allows such top-n queries that take full advantage of the best indexed path. With current syntax, I haven’t found a solution that is both optimal and functionally safe.

There are several other table-access hints that I have not described in this section, but I have never found them necessary.

Execution-order hints

These are the main hints to control the order of execution for joins and subqueries:

ORDERED

This directs Oracle, when possible, to join the tables in the FROM clause in the same order that they are listed.

Tip

This hint, unlike the others, usually requires that you alter the body of the SQL (or at least the FROM clause) to get the plan you want, since the hint refers to the FROM-clause order. Notice that the desired FROM-clause order will be precisely the opposite of the best FROM-clause order you would choose for rule-based optimization. That’s because the RBO works from right to left, whereas this hint causes the CBO to work through the FROM clause from left to right.

LEADING( <Alias_Name> )

In the absence of an ORDERED hint, this selects the driving table, the first table in the join order. Although this gives less control over the join order than the ORDERED hint, it does not require modifying the FROM clause. Often, getting just the driving table correct is all you need to get at least close to the performance of the optimal plan. Later choices in the join order tend to matter less and will likely be well chosen by the optimizer, without your help.

PUSH_SUBQ

This hint instructs the optimizer to perform correlated subqueries at the first opportunity, as soon as the outer query reaches the join columns needed to evaluate them. Oracle’s CBO normally performs correlated subqueries only after completing all the joins in the outer query.

The ORDERED and LEADING hints are common and straightforward to use. The PUSH_SUBQ hint is occasionally useful.

When it comes to subqueries, Oracle offers hint-based control only at the two extremes: executing subqueries as early or as late as possible. However, you can gain full control of when subqueries execute if you combine the PUSH_SUBQ hint with the earlier methods of postponing correlated joins. For example, consider the earlier query:

SELECT ... 
FROM Orders O, Customers C, Regions R
WHERE O.Status_Code='OP'
  AND O.Customer_ID=C.Customer_ID
  AND C.Customer_Type_Code='GOV'
  AND C.Region_ID=R.Region_ID
AND EXISTS (SELECT NULL
            FROM Order_Details OD
            WHERE O.Order_ID+0*C.Customer_ID=OD.Order_ID
              AND OD.Shipped_Flag='Y')

Without a hint, Oracle would execute the EXISTS check after joining all three outer-query tables. The point of the expression O.Order_ID+0*C.Customer_ID was to delay the EXISTS check until after the join to C, but not after the join to R. However, without any hint, all EXISTS conditions are automatically delayed until after all outer-query joins. To force the EXISTS condition to execute between the joins to C and R, use both the hint and the correlating-join-postponing expression:

SELECT /*+ PUSH_SUBQ */ ... 
FROM Orders O, Customers C, Regions R
WHERE O.Status_Code='OP'
  AND O.Customer_ID=C.Customer_ID
  AND C.Customer_Type_Code='GOV'
  AND C.Region_ID=R.Region_ID
  AND EXISTS (SELECT NULL 
              FROM Order_Details OD
              WHERE O.Order_ID+0*C.Customer_ID=OD.Order_ID
                AND OD.Shipped_Flag='Y')

Now, the PUSH_SUBQ hint causes Oracle to execute the EXISTS condition as early as possible, and the expression O.Order_ID+0*C.Customer_ID ensures that “as early as possible” doesn’t come until after the join to C.

Join-method hints

These are the main hints to control the join methods:

USE_NL( <List_Of_Aliases> )

This directs Oracle, when possible, to join the tables indicated in the alias list by using nested loops. The alias list is without commas—for example, USE_NL(T1 T2 T3).

USE_HASH( <List_Of_Aliases> )

This directs Oracle, when possible, to join to the tables indicated in the alias list by using hash joins. The alias list is without commas—for example, USE_HASH(T1 T2 T3).

Example

Here’s an example to illustrate the most frequently useful hints to yield complete control of an execution plan. I’ll force the join order, the access method to every table, and the join method to every table. Consider the earlier example tuned for the RBO, shown at the end of Section 4.2.2. To fully force the same plan, but substitute a hash join for the first nested-loops join, with the employee locations read through the index on Description, use this query:

SELECT /*+ ORDERED USE_NL(M LM) USE_HASH(LE) INDEX(E Employee_Last_Name) 
           INDEX(LE Location_Description) INDEX(M Employee_Pkey) 
           INDEX(LM Location_Pkey) */ 
       E.First_Name, E.Last_Name, E.Salary, LE.Description,
       M.First_Name, M.Last_Name, LM.Description
FROM Employees E, Locations LE, Employees M, Locations LM
WHERE E.Last_Name = 'Johnson'
  AND E.Manager_ID=M.Employee_ID
  AND E.Location_ID=LE.Location_ID
  AND M.Location_ID=LM.Location_ID 
  AND LE.Description='Dallas'

This results in the execution plan, as shown here:

SQL>@ex

PLAN
----------------------------------------------------------------------
SELECT STATEMENT
  NESTED LOOPS
    NESTED LOOPS
      HASH JOIN
        TABLE ACCESS BY INDEX ROWID 1*EMPLOYEES
          INDEX RANGE SCAN EMPLOYEE_LAST_NAME
        TABLE ACCESS BY INDEX ROWID 2*LOCATIONS
          INDEX RANGE SCAN LOCATION_DESCRIPTION
      TABLE ACCESS BY INDEX ROWID 3*EMPLOYEES
        INDEX UNIQUE SCAN EMPLOYEE_PKEY
    TABLE ACCESS BY INDEX ROWID 4*LOCATIONS
      INDEX UNIQUE SCAN LOCATION_PKEY

Controlling Plans on DB2

DB2 offers relatively sparse vendor-specific tools to control execution plans, so the methods used to tune on DB2 are comparatively indirect. There are three main steps involved in tuning on DB2:

  1. Provide the optimizer with good statistics about the tables and indexes, so it can calculate the costs of alternatives accurately.

  2. Choose the optimization level that DB2 applies to your query.

  3. Modify the query to prevent execution plans that you do not want, mainly using the methods described earlier in Section 4.1.

DB2 Optimization Prerequisites

Proving that a little knowledge is a dangerous thing, cost-based optimizers often do a terrible job if they do not have statistics on all the tables and indexes involved in a query. It is therefore imperative to maintain statistics on tables and indexes reliably; this includes regenerating statistics anytime table volumes change much or anytime tables or indexes are rebuilt. It is safest to regenerate statistics periodically, during times when load is relatively quiet, nightly or at least weekly. Edit a file runstats_schema.sql from the Unix prompt and type the following commands, replacing <Schema_Name> with the name of the schema that contains the objects you wish to run statistics on:

-- File called runstats_schema.sql
SELECT 'RUNSTATS ON TABLE<Schema_Name>.' || TABNAME || ' AND INDEXES ALL;'
FROM SYSCAT.TABLES 
WHERE TABSCHEMA = '<Schema_Name>';

To use this script, log into db2, escape to the shell prompt with quit;, and run the following two commands from the Unix shell:

db2 +p -t < runstats_schema.sql > tmp_runstats.sql
grep RUNSTATS tmp_runstats.sql | db2 +p -t > tmp_anal.out

These commands can be scheduled to run automatically. Check tmp_anal.out in case any of the analyses fail.

Often, queries include conditions on highly skewed distributions, such as conditions on special types, codes, or flags, when these columns have only a few values. Normally, the CBO evaluates selectivity of a condition based on the assumption that all nonnull values of a column are equally selective. This assumption generally works well for foreign and primary keys that join business entities, but it breaks down when the columns have permanent special meanings and certain meanings apply much more rarely than others.

For example, in an Orders table, you might have a Status_Code column with three possible values: 'CL' for closed (i.e., fulfilled) orders, 'CA' for cancelled orders, and 'OP' for open orders. Most orders, by far, would be fulfilled; so, once the application has been running for a few months, you’d expect 'CL' to point to a large and steadily increasing number of orders. A steady, significant fraction of orders would end up cancelled, so 'CA' would also eventually point to a large list of orders. However, as long as the business keeps up with incoming orders, the number of open orders would remain moderate and steady, even as data accumulates for years. Quite early, a condition that specified Status_Code='OP' would be selective enough to justify indexed access, if you had an index with that leading column, and it is important to enable the optimizer to realize this fact, preferably without a lot of manual tuning. This requires two things:

  • The SQL must mention the specific selective value, rather than use a bind variable. Use of bind variables is commonly attractive, since it makes SQL more general and easier to share between processes. However, this need to hardcode especially selective values is the exception to that rule. If you use Status_Code= ? instead of Status_Code='OP', you will deny the CBO potential knowledge of the selectivity of the condition at parse time, when it does not yet know whether the bind variable ? will be assigned to a common or a rare Status_Code. Fortunately, in these cases, the usual reason to prefer using bind variables does not generally apply; since these special codes have special business meanings, it is unlikely that the SQL will ever require substituting a different value than the single selective value.

  • You need to provide the CBO with special statistics that quantify how rare the uncommon code, type, or status values are, so it can know which values are highly selective.

DB2 stores special statistics on distribution, when you request them. To create distribution statistics for the example case, given an index named Order_Stts_Code and the schema owned by ApplProd, use the following command:

RUNSTATS ON TABLE ApplProd.Orders 
  WITH DISTRIBUTION FOR INDEX ApplProd.Order_Stts_Code;

Anytime you have a column with a skewed distribution and an index that you wish to use when your condition on the column has a high degree of selectivity, be sure to create distribution statistics in the manner shown here.

Choosing the Optimization Level

DB2 offers multiple optimization levels. An optimization level is basically a ceiling on how clever the optimizer attempts to be when it considers the range of possible execution plans. At optimization level 0, DB2 chooses the lowest cost plan within a subset of the plans it considers at level 1; at level 1, it considers just a subset of the plans it considers at level 2; and so on. Nominally, the highest optimization level should always yield the best plan, because it chooses the lowest cost plan from the widest possible range of alternatives. However, the plans enabled by the higher optimization levels tend to be less robust and often prove disappointing. In spite of the optimizer’s calculations to the contrary, these less robust plans often run longer than the best robust plan that the lower-level optimization sees. Higher levels of optimization can also take longer to parse, since the optimizer has additional degrees of freedom to explore. Ideally, you parse every statement at the lowest level that is capable of finding the best execution plan for a given query.

DB2 offers seven levels of optimization: 0, 1, 2, 3, 5, 7, and 9.[1] Level 5 is normally the default, although database administration can override this default choice. I have never needed levels of optimization higher than 5; levels 7 and 9 appear mainly to enable relatively exotic query transformations that are rarely useful. However, I have frequently found excellent results with the lowest level of optimization, level 0, when level 5 produced a poor plan. Before executing a query (or checking an execution plan), set level 0 with the following SQL statement:

SET CURRENT QUERY OPTIMIZATION 0;

When you wish to return to level 5 for other queries that require it, use the same syntax, replacing 0 with 5. If you find a poor plan at level 5, I recommend trying level 0 after first verifying correct statistics on the tables and indexes involved. Level 0 frequently yields just the sort of robust plans that usually work best for real-world applications.

Modifying the Query

Most manual tuning on DB2 uses the SQL changes described earlier in Section 4.1. However, one particular manual technique deserves special mention, because it proves useful more often on DB2 than on Oracle and SQL Server. DB2 stores index records even for null values of indexed columns, and it appears to treat null like just another indexed value.

When DB2 lacks special statistics on distribution (see Section 4.3.1), DB2 estimates the selectivity of Indexed_Column IS NULL to be just as high as Indexed_Column = 198487573 or any other nonnull value. Therefore, older DB2 versions often choose to drive to selective-looking IS NULL conditions on indexed columns. Occasionally, this works out fine. However, in my experience, IS NULL conditions are rarely anywhere near as selective as the average individual nonnull value, and indexed access driven by IS NULL conditions is almost always a mistake.

Therefore, when you find an IS NULL condition on an indexed column in a DB2 query, you often should prevent use of the index. The simplest equivalent condition that prevents index use is COALESCE(Indexed_Column, Indexed_Column) IS NULL. This version is perfectly equivalent to the original condition Indexed_Column IS NULL, but the COALESCE( ) function prevents index use.

In addition to tuning techniques that can apply to any database, there are three useful techniques specific to DB2 that I describe in the following sections.

Place inner joins first in your FROM clause

One sometimes useful technique is simply to list inner joins first in your FROM clause. This appears never to hurt, and on older versions of DB2 I have seen this simple technique produce greatly improved execution plans.

Prevent too many outer joins from parsing at once

Older versions of DB2 can take minutes to parse queries with more than about 12 outer joins, and even then they might fail with errors. Fortunately, there is a workaround for this problem, using the following template for the SQL. The workaround uses DB2’s nested-tables syntax, in which an outer query contains another query inside a FROM clause that is treated like a single table for purposes of the outer query:

SELECT ... 
FROM (SELECT ... 
      FROM (SELECT ... FROM<all inner joins and 
                                               ten outer joins> 
                       WHERE <Conditions pertinent
                                                to this innermost nested table>) T1
            LEFT OUTER JOIN <Joins for the 11th 
                                               through 20th outer join>
      WHERE <Conditions, if any, pertinent
                               to this outermost nested table>) T2
     LEFT OUTER JOIN <The rest of the outer joins (at most 10)>
WHERE <Conditions, if any, pertinent to the outer query>

This template applies to a query with 21-30 outer-joined tables. With 11-20 outer-joined tables, you need only a single nested table. With more than 30 outer-joined tables, you need even deeper levels of nesting. In this syntax, DB2 effectively creates nested views on the fly, as defined by the queries inside parentheses in the FROM clauses. For purposes of handling outer joins, DB2 handles each of these smaller queries independently, sidestepping the problem of too many outer joins in a single query.

Tip

At my former employer, TenFold, we found this technique so useful that we enhanced the EnterpriseTenFold product to automatically generate this extraordinarily complex SQL when required. Admittedly, it is not an easy solution for manually written SQL, but it still might be the only technique that works if you run into slow or failed parses of many-way outer joins on DB2.

Let DB2 know when to optimize the cost of reading just the first few rows

Normally, DB2 calculates the cost of executing the entire query and chooses the plan it expects will run the fastest end to end. However, especially for online queries, you often care only about the first few rows and prefer to optimize to get the first rows soonest.

The technique to read the first rows fast, usually following nested loops, is to add the clause OPTIMIZE FOR <n> ROWS (or OPTIMIZE FOR 1 ROW), where <n> is the number of rows you actually need to see fast out of the larger rowset that the query might theoretically return. This clause goes at the very end of the query and instructs DB2 to optimize the cost of returning just those first <n> rows, without regard to the cost of the rest of the query execution. If you actually know how many rows you want and trust the optimizer to calculate the best plan, you can choose <n> on that basis. If you want to force a robust, nested-loops plan as strongly as possible, just use OPTIMIZE FOR 1 ROW.

In practice, this technique tends to dictate nested-loops joins, because they avoid reading whole rowsets before even beginning a join. However, it is possible for an explicit ORDER BY clause to defeat any attempt to reach the first rows fast. The ORDER BY clause usually requires a sort following the complete query, usually postponing return of the first row regardless of the execution plan. You can leave out a sort condition if you want to force nested-loops joins by this technique, performing the sort in your application if necessary. The OPTIMIZE FOR 1 ROW hint is the equivalent of the FIRST_ROWS hint on Oracle and the OPTION(FAST 1) hint on SQL Server.

Techniques to force precisely chosen execution plans on DB2 are sparse, in contrast to the extraordinary detail that DB2 reveals about the execution plan you already have and why DB2 chose it. However, in fairness, I should mention that the available techniques, in combination with DB2’s fairly good optimizer, have proven sufficient in my own experience.

Controlling Plans on SQL Server

There are three main steps involved in tuning on SQL Server:

  1. Provide the optimizer with good statistics about the tables and indexes, so it can calculate the costs of alternatives accurately.

  2. Modify the query to prevent execution plans that you do not want, mainly using methods specific to SQL Server.

  3. Force a simple execution plan with FORCEPLAN when necessary.

SQL Server Optimization Prerequisites

Proving that a little knowledge is a dangerous thing, cost-based optimizers often do a terrible job if they do not have statistics on all the tables and indexes involved in the query. It is therefore imperative to maintain statistics on tables and indexes reliably; this includes regenerating statistics anytime table volumes change much or anytime tables or indexes are rebuilt. It is safest to regenerate statistics periodically, during times when load is relatively quiet, nightly or at least weekly. Run the following from Query Analyzer, then cut and paste the resulting UPDATE STATISTICS commands into the query window and run them as well:

-- file called updateall.sql 
-- update your whole database
SELECT 'UPDATE STATISTICS ', name 
FROM sysobjects 
WHERE type = 'U'

Often, queries include conditions on highly skewed distributions, such as conditions on special types, codes, or flags, when these columns have only a few values. SQL Server automatically maintains statistics on the distribution of indexed column values, enabling SQL Server to estimate selectivities automatically, even when indexed columns have skewed distributions.

Occasionally, it is useful to help the SQL Server estimate the selectivity of condition with a skewed distribution even when the distribution applies to a nonindexed column. In such a case, you need to specially request data on that column. For example, to request a statistics group named Eflag on the nonindexed column Exempt_Flag of the Employees table, run:

CREATE STATISTICS EFlag on Employees(Exempt_Flag)

As an example of a case in which such skewed distributions might apply, consider an Orders table in which you have a Status_Code column with three possible values: 'CL' for closed (i.e., fulfilled) orders, 'CA' for cancelled orders, and 'OP' for open orders. Most orders, by far, would be fulfilled once the application has been running for a few months, causing a steady rise in 'CL' values. A steady, significant fraction of orders would end up cancelled, so the value 'CA' would also eventually point to a large list of orders. However, as long as the business keeps up with incoming orders, the number of open orders would remain moderate and steady, even as data accumulates for years. Quite early, a condition that specified Status_Code='OP' would be selective enough to prefer joining to the corresponding table early, even if Status_Code is not indexed, and it is important to enable the optimizer to realize this fact, preferably without a lot of manual tuning. This requires that your SQL actually mention the specific value that applies to the condition, rather than use a generic stored procedure that only fills in the value of the constant after the parse, at execution time.

Modifying the Query

You should usually tune SQL Server with hints. Hints generally go in either the FROM clause, when they apply to a specific table access, or in the SQL Server OPTION( ) clause at the very end of a query. These are the most useful hints:

WITH (INDEX( <Index_Name> ))

Immediately following a table alias in a FROM clause, this hint instructs SQL Server to use the specified index to access that table alias. The older alternative syntax INDEX= <Index_Name> is also supported, but it might be dropped in the future, so I don’t recommend it. Even more obsolete and dangerous is the still-supported method of naming the internal object ID that corresponds to the desired index. Naming the index you want with the ID is horribly unreliable, because the index will get a new ID if anyone ever drops it and recreates it, or if the application moves to a new SQL Server database.

WITH (INDEX(0))

Immediately following a table alias in a FROM clause, this hint instructs SQL Server to use a full table scan to reach that table alias.

WITH (NOLOCK)

Immediately following a table alias in a FROM clause, this hint instructs SQL Server to read the specified table alias without requiring read locks or otherwise enforcing a consistent read. Read locks on SQL Server can create a bottleneck when combined with heavy update activity on a table. This hint avoids such a bottleneck, potentially at the cost of a consistent view of the data as of a single moment in time.

LOOP and HASH

These two different hints can each immediately precede the JOIN keyword in the FROM clause, instructing SQL Server to perform the specified join with the specified join method. These hints require the new-style join syntax with the JOIN keyword in the FROM clause. The presence of even a single hint of this type also forces all joins to take place in the same order the aliases are listed in the FROM clause.

OPTION(LOOP JOIN)

This hint goes at the end of a query and forces all joins to follow nested loops.

OPTION(FORCE ORDER)

This hint goes at the end of a query and forces all joins to take place in the same order the aliases are listed in the FROM clause.

OPTION(FAST 1)

This hint simply instructs SQL Server to attempt to reach the first returned rows as quickly as possible, which generally favors a nested-loops execution plan. Its effect is much like the OPTION(LOOP JOIN) hint, although in theory SQL Server might recognize that no execution plan could reach the first rows quickly in a query with an explicit ORDER BY, nullifying any effect from OPTION(FAST 1). The OPTION(FAST 1) hint is the equivalent of the FIRST_ROWS hint on Oracle and the OPTIMIZE FOR 1 ROW hint on DB2.

These hints can be combined. You can place multiple hints within a single WITH clause, separating them with commas—for example, WITH (INDEX(Employee_First_Name), NOLOCK). Multiple hints in a single OPTION clause also are separated by commas—for example, OPTION(LOOP JOIN, FORCE ORDER). Together, these hints give full control of the join order, the join methods, and the table access methods.

Hint Examples

I’ll demonstrate tuning with hints on a couple of queries. If you choose a robust all-nested-loops plan that drives from the employee last name to the other tables in optimum order, using the primary keys to reach the other tables, this query’s hints force the desired plan:

SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
       M.First_Name, M.Last_Name, LM.Description
FROM Employees E                   WITH (INDEX(Employee_Last_Name))
     INNER JOIN Locations LE  WITH (INDEX(Location_PKey))
                  ON E.Location_ID=LE.Location_ID
     INNER JOIN Employees M   WITH (INDEX(Employee_PKey))
                  ON E.Manager_ID=M.Employee_ID 
     INNER JOIN Locations LM  WITH (INDEX(Location_PKey))
                  ON M.Location_ID=LM.Location_ID
WHERE E.Last_Name = 'Johnson'
  AND LE.Description='Dallas'
OPTION(LOOP JOIN, FORCE ORDER)

SET SHOWPLAN_TEXT ON (as described in Chapter 3) generates the following results when you run this query from SQL Server Query Analyzer:

StmtText                        
-----------------------------------------------------------------
  |--Bookmark Lookup(...(...[Locations] AS [LM]))
       |--Nested Loops(Inner Join)
            |--Bookmark Lookup(...(...[Employees] AS [M]))
            |    |--Nested Loops(Inner Join)
            |         |--Filter(WHERE:([LE].[Description]='Dallas'))
            |         |    |--Bookmark Lookup(...(...[Locations] AS [LE]))
            |         |         |--Nested Loops(Inner Join)
            |         |              |--Bookmark Lookup(...(...[Employees] AS [E]))
            |         |              |    |--Index Seek(...(...(wrapped line) [Employees].[Employee_Last_Name] 
(wrapped line) AS [E]), SEEK:([E].[Last_Name]='Johnson') ORDERED)
            |         |              |--Index Seek(...(...[Locations].[Location_PKey] 
(wrapped line) AS [LE]), SEEK:([LE].[Location_ID]=[E].[Location_ID]) ORDERED)
            |         |--Index Seek(...(...[Employees].[Employee_PKey] 
(wrapped line) AS [M]), SEEK:([M].[Employee_ID]=[E].[Manager_ID]) ORDERED)
            |--Index Seek(..(...[Locations].[Location_PKey] 
(wrapped line) AS [LM]), SEEK:([LM].[Location_ID]=[M].[Location_ID]) ORDERED)

(12 row(s) affected)

If you don’t want to specify all nested loops, you might need the join HASH and LOOP hints, as shown in the following alternative to the last query:

SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
       M.First_Name, M.Last_Name, LM.Description
FROM Employees E                   WITH (INDEX(Employee_Last_Name))
     INNER HASH JOIN Locations LE  WITH (INDEX(Location_Description))
                  ON E.Location_ID=LE.Location_ID
     INNER LOOP JOIN Employees M   WITH (INDEX(Employee_PKey))
                  ON E.Manager_ID=M.Employee_ID 
     INNER LOOP JOIN Locations LM  WITH (INDEX(Location_PKey))
                  ON M.Location_ID=LM.Location_ID
WHERE E.Last_Name = 'Johnson'
  AND LE.Description='Dallas'

The preceding query delivers the following execution plan, triggered by SET SHOWPLAN_TEXT ON:

StmtText          
-------------------------------------------------------------
  |--Bookmark Lookup(...(...[Locations] AS [LM]))
       |--Nested Loops(Inner Join)
            |--Bookmark Lookup(...(...[Employees] AS [M]))
            |    |--Nested Loops(Inner Join)
            |         |--Hash Match(Inner Join...(wrapped line) ([E].[Location_ID])=([LE].[Location_ID])...)
            |         |    |--Bookmark Lookup(...(...[Employees] AS [E]))
            |         |    |    |--Index Seek(...(...[Employees].[Employee_Last_Name] 
(wrapped line) AS [E]), SEEK:([E].[Last_Name]='Johnson') ORDERED)
            |         |    |--Bookmark Lookup(...(...[Locations] AS [LE]))
            |         |         |--Index Seek(...(...[Locations].[Location_Description]
(wrapped line) AS [LE]), SEEK:([LE].[Description]='Dallas') ORDERED)
            |         |--Index Seek(...(...[Employees].[Employee_PKey] 
(wrapped line) AS [M]), SEEK:([M].[Employee_ID]=[E].[Manager_ID]) ORDERED)
            |--Index Seek(...(...[Locations].[Location_PKey] 
(wrapped line) AS [LM]), SEEK:([LM].[Location_ID]=[M].[Location_ID]) ORDERED)

(11 row(s) affected)

There are two basic extremes involved in tuning with hints such as those in this example:

  • Use as little direction as possible to get the execution plan you want, or at least to get close enough to the plan you want for reasonable performance. This approach reasons that SQL Server has more information than you have, and it should be left free to adapt to changing data distributions and take advantage of improvements in SQL Server with future releases. By leaving SQL Server the maximum degrees of freedom, you maximize its power to optimize well for you in the future. However, until you try, you won’t know how much direction the SQL Server will need if it did not get the plan right in the first place, so this approach is likely to be iterative, involving adding one hint at a time until SQL Server delivers a good plan.

  • If you did not get the plan you wanted from SQL Server automatically, assume the database has made bad assumptions that will propagate to distort all of its calculations. Therefore, leave it with very little freedom, specifying essentially the whole plan you want.

If you are confident in your chosen execution plan, as you should be if you apply the methods I describe later in this book, there is little reason to hold back from fully specifying that plan. I have yet to find a case in which a well-chosen, robust execution plan needed to evolve to handle new data distributions or new database features. On the other hand, it is easy for SQL with a partially restricting set of hints to go wrong, especially if some table or index loses its statistics. When SQL Server chooses incorrectly, the error that made the database choose incorrectly is likely to propagate over the entire plan. However, the OPTION(FAST 1) hint is the sort of instruction that can be useful even when SQL Server has perfect information, simply specifying that the time to reach the first row is more important than the time to reach the last row.

Using FORCEPLAN

An older method to tune on both Microsoft SQL Server and Sybase is the FORCEPLAN option. You execute this option with a standalone SQL statement:

SET FORCEPLAN ON

This option affects all SQL for that connection until you execute this statement:

SET FORCEPLAN OFF

When FORCEPLAN is ON, the database is instructed to perform only the simplest optimization on the SQL it sees. It generally uses nested-loops execution plans that drive through indexes and join tables in the same order you list them in the FROM clause. When this is the sort of plan you want, SET FORCEPLAN can be ideal, not only forcing the plan simply, but even saving parse time that would otherwise be wasted considering a much wider range of plans, especially for joins of many tables. It is a blunt-edged sword, so to speak, so only use it when you know the FROM-clause join order is the correct join order and you want nested loops.



[1] Levels 4, 6, and 8 are not available, presumably for historical reasons, although I have never found these reasons documented.

Get SQL Tuning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.