The seaman’s story is of tempest, the plowman’s of his team of bulls; the soldier tells his wounds, the shepherd his tale of sheep.
More than 10 years ago, I came to understand that the biggest factor in the performance of a business application is the speed of the SQL it runs. It took me longer to realize just how much room for improvement typically lies in that SQL. The SQL that most effects the load on a system and the productivity of its end users can usually be improved by a large factor, usually by a factor of two or more. However, I found little guidance regarding just how to tune SQL. I believe that problem persists today.
Academic journals describe detailed methods that are suitable for automated optimization, but these methods are not adapted for manual tuning. Documentation for the practitioner, so far as I’ve seen, is incomplete. Database vendors and independent authors document well how to review the path the database takes to reach the data. (The path to the data is known as the execution plan.) Armed with the execution plan, you can understand why a query runs as long as it does. With varied success, the documentation also covers what you can do to change an execution plan, if you suspect that it is not optimal. The missing part in the literature is a detailed manual process to deduce, without endless trial and error, exactly which execution plan you should want. Since real business-application queries can easily offer billions of alternative execution plans, tuning without a systematic method to choose your target execution plan is hopelessly inefficient. The problem is akin to finding yourself lost in a strange city without a map: working eyes and legs are not enough to take you where you need to go.
The missing piece of the problem, how to choose the best execution plan, turns out to be virtually independent of your choice of database vendor. This presents a wonderful opportunity for a single book on SQL tuning that is 80% vendor-independent, because 80% of the book has nothing to do with the uninteresting details of viewing and manipulating execution plans with vendor-specific methods. I wrote this book to fulfill that opportunity and to enable you to approach SQL tuning with this powerful, vendor-independent approach.
I designed this book to equip you to respond correctly to any SQL tuning problem. The narrowest and most common solution to a SQL tuning problem is a prescription for some combination of changes to the database (for example, new indexes) and, more often, changes to the SQL itself. These changes allow the formerly slow statement to run faster, with no changes in functionality and no change in the calling application, except in the SQL itself. This common solution is especially attractive, because it is usually simple and it rarely has unintended side effects.
Occasionally, when you analyze a SQL tuning problem, you discover symptoms that generally indicate a subtle functional defect that goes hand in hand with the performance defect. The method of tuning analysis I describe makes those subtle functional defects particularly easy to identify and describe, and helps you prescribe solutions that fix corner-case functional defects as a side effect of your performance analysis. This book focuses mainly on tuning, however. If you are on Oracle, you can find good advice on getting your SQL functionally correct in Sanjay Mishra’s and Alan Beaulieu’s book Mastering Oracle SQL (O’Reilly & Associates, Inc.).
Rarely, a SQL tuning problem cannot be solved just by speeding up one query; the query returns too many rows, or it runs too frequently to ever run as fast as it must, even fully optimized. For these rare problems, I describe systematic solutions in the application layer that change the problem model, creating a new problem that has ready solutions.
I wrote this book for anyone who already knows SQL but, at least sometimes, needs to find ways to make SQL run faster. Traditionally, the same people who write the SQL in the first place, the application developers, do most SQL tuning. Certainly, I hope this book helps developers solve their own tuning problems, especially the most common types of problems. However, from my own experience as a tuning specialist, I find that it is at least equally efficient to tune other people’s SQL. Fortunately, SQL provides a clear spec for which rows an application requires at a given point, and you needn’t have any application-specific knowledge at all to determine a faster path to reach those same rows. Since the tuner needs no knowledge of the application, it is easy to tune SQL written by other people, and a specialist has the opportunity to learn to tune more efficiently than any nonspecialist could, especially when dealing with the hardest problems.
For your own purposes, you might not need to read this book in order, cover to cover. The following summary should help you work out which parts of the book you can skip or skim, which parts you can reserve for occasional reference, and which parts you should read thoroughly, in which order:
Provides an overview of the motivation for SQL tuning and the approach this book takes to the problem, as well as some side benefits that come with following that approach. This chapter is short and easy, and I recommend you read it first.
Describes how databases access individual tables with full table scans and indexed reads, how databases join tables, and the tradeoffs between these alternatives. If you already know the basics of how databases execute queries, you might be able to skip or skim this chapter.
Covers how to read and interpret an execution plan on Oracle, Microsoft SQL Server, and DB2. If you have done any SQL tuning at all, you likely already know this for whatever database concerns you. Chapter 3 also separates the coverage of each database and even repeats material that applies to more than one database, so you need to read only the section or sections that matter to you.
Covers how to control execution plans on Oracle, Microsoft SQL Server, and DB2. This discussion includes some generic techniques that provide some control of execution plans on any relational database. If you have done significant SQL tuning, you might already know how to control execution plans for whatever database concerns you. Like Chapter 3, Chapter 4 also separates the coverage of each database and even repeats material that applies to more than one database, so you need to read only the section or sections that matter to you.
Covers the foundation for the rest of the book, which won’t even make sense unless you read this chapter first. This chapter introduces a shorthand, pictorial language that greatly clarifies the core of a SQL tuning problem. Chapter 5 lays a foundation that makes the rest of the book far clearer and more concise than it could be without this pictorial language. Read the chapter and learn the pictorial language well before you read the rest of the book. (The language takes some patience to learn, but it is worth it!)
Explains how to use the query diagrams you learned to make in Chapter 5 to tune 2-way, 5-way, even 115-way joins rapidly, without trial and error. This is the big payoff, so don’t stop before you understand this material.
Shows you how to tune complex queries such as queries containing subqueries that do not fit the standard, simple, n-way-join template. As a bonus, this chapter will also describe how to diagnose and repair logic problems with the SQL (as opposed to performance problems) that become obvious once you know how to build and interpret the query diagrams.
Justifies the rules of thumb I discuss in Chapter 5-Chapter 7. If you don’t quite trust me, or if you just think you would apply this book’s method better if you had an understanding of why the method works, this chapter should help. You might even want to read this chapter early if you are losing patience with rote application of mysterious rules earlier in the book.
Covers advanced material you might refer to only as problems arise, if you would rather not read so much cover-to-cover. However, I recommend at least skimming this chapter to learn what is there, so you can recognize the problems when you see them.
Explains how to cope with even “impossible” problems, problems with no fast execution plan capable of delivering the rows required by the original SQL. This material is highly recommended, but I leave it for last because until you know how to get the best execution plan, you cannot recognize which (surprisingly few) problems require these outside-the-box solutions.
Follows the solution of a SQL tuning problem through the whole diagramming method, from start to finish, on Oracle, DB2, and SQL Server. If you like to work from complete, end-to-end examples, this appendix is for you.
Defines key terms and phrases used in the book.
Used for SQL examples, file contents, and examples of output.
Used for table and column names, whether in SQL or referring
to SQL from within the body of a paragraph. Also used for alias
names and node names, which are elements in a
SQL diagram that theoretically refer to table aliases, even when a
diagram sometimes shows an abstract tuning problem without
referring to a specific SQL statement that corresponds to that
problem. Since aliases are usually made an acronym based on a
table name, such as
CT for the
aliases are usually pure uppercase.
(C, O, OT, OD, ODT, P, S, A)
A constant-width list of aliases, node names, or
columns, bounded in parentheses. I borrow this
n-tuple notation from mathematics to indicate
an ordered list of items. In the example case, the notation
describes a join order between nodes in a join diagram,
representing table aliases. In another example,
Code) would represent a pair of indexed
columns in a two-column index, with
Code_Type as the first column.
Code) represents the same index, while specifying that
it is on the table
Constant-width italic text inside angle brackets
describes missing portions of a SQL statement template, which you
must fill in, that represents a whole class of statements. For
<Expression> represents any
equality condition matching the leading column of an index with
any other expression.
Indicates a general note, tip, or suggestion. For example, I sometimes use notes for asides specific to a particular database vendor, in the midst of an otherwise vendor-independent discussion.
We at O’Reilly have tested and verified the information in this book to the best of our ability, but you may find that features have changed or that we have made mistakes. If so, please notify us by writing to:
|O’Reilly & Associates, Inc.|
|1005 Gravenstein Highway North|
|Sebastopol, CA 95472|
|(800) 998-9938 (in the U.S. or Canada)|
|(707) 829-0515 (international or local)|
|(707) 829-0104 (fax)|
You can also send messages electronically. To be put on the mailing list or request a catalog, send email to:
To ask technical questions or comment on the book, send email to:
Readers who would like to contact the author to ask questions or discuss this book, SQL tuning, or other related topics can reach him at firstname.lastname@example.org. You can also find the author’s home page on the Web at http://www.singingsql.com.
I owe my parents, Lois and Philip, scientists both, for the example they set in all aspects of their lives. My thesis advisor, Dale Rudd, showed me the highest example of keeping both eyes on the big picture, never missing an opportunity to solve the real, underlying problem for having too-narrowly defined a problem for immediate attack. My brother Bruce paved my way to relational databases and has always shared his knowledge generously.
My former manager Roy Camblin, who was CIO of Oracle at the time, pushed me to find a simple set of rules to teach SQL tuning, when I didn’t know such rules myself. The rules are not so simple, but, to paraphrase Einstein, they are as simple as they can be. Oracle gave me the liberty to develop the rules in depth, and TenFold Corporation gave me the opportunity to see firsthand how well they applied across all relational databases. TenFold Corporation further provided generous access to test databases to verify the vendor-specific behavior that I describe in this book.
I owe a special debt of gratitude to Jonathan Gennick, the editor of this book. Jonathan lent his expert hand at all levels, catching technical glitches, correcting the organization of the material when needed, and simply making the language work. His excellent, patient, and well-explained feedback not only made this a much better book, but also made me a much better writer by the end.
Taj Johnson, David Ozenne, Dave Hunt, Alexei Chadovich, and Jeff Walker provided generous and valuable technical assistance—thank you all! For generous help to get the technical content right, I give special thanks to my two technical reviewers, both excellent and experienced practitioners, Virag Saksena and Alan Beaulieu. Any mistakes that remain are all mine, of course.
Since this book is more than just words on pages, I thank the skillful staff at O’Reilly and Associates: Brian Sawyer, the production editor and copyeditor; Robert Romano and Jessamyn Read, the very patient illustrators; Ellie Volckhausen, the cover designer; Melanie Wang, the interior designer; Julie Hawks, who converted the files; Matt Hutchinson, the proofreader; Darren Kelly and Claire Cloutier, who provided quality control; and Angela Howard, the indexer.
Finally, I owe my wife, Parva, and my children, Tira and Abe, for their endless patience and faith in me, and for giving me the best of reasons to achieve all that I can.