BUY THIS BOOK
Add to Cart

PDF $27.99

Safari Books Online

What is this?

Looking to Reprint or License this content?

High Performance MySQL
High Performance MySQL Optimization, Backups, Replication, Load Balancing & More

By Jeremy Zawodny, Derek J. Balling

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Back To Basics
Many MySQL users and administrators slide into using MySQL. They hear its benefits, find that it's easy to install on their systems (or better yet, comes pre-installed), and read a quick book on how to attach simple SQL operations to web sites or other applications.
It may take several months for the dragons to raise their heads. Perhaps one particular web page seems to take forever, or a system failure corrupts a database and makes recovery difficult.
Real-life use of MySQL requires forethought and care—and a little benchmarking and testing. This book is for the MySQL administrator who has the basics down but realizes the need to go further. It's a good book to read after you've installed and learned how to use MySQL but before your site starts to get a lot of traffic, and the dragons are breathing down your neck. (When problems occur during a critical service, your fellow workers and friendly manager start to take on decidedly dragon-like appearances.)
The techniques we teach are valuable in many different situations, and sometimes to solve different problems. Replication, for instance, may be a matter of reliability for you—an essential guarantee that your site will still be up if one or two systems fail. But replication can also improve performance; we show you architectures and techniques that solve multiple problems.
We also take optimization far beyond the simple use of indexes and diagnostic (EXPLAIN) statements: this book tells you what the factors in good performance are, where bottlenecks occur, how to benchmark MySQL, and other advanced performance topics.
We ask for a little more patience and time commitment than the average introductory computer book. Our approach involves a learning cycle, and experience convinces us that it's ultimately the fastest and most efficient way to get where you want.
After describing the problems we're trying to solve in a given chapter, we start with some background explanation. In other words, we give you a mental model for understanding what MySQL is doing. Then we describe the options you have to solve the problem, and only after all that do we describe particular tools and techniques.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Binary Versus Compiled-From-Source Installations
There are two ways you can install MySQL. As a novice administrator, you may have simply installed a binary package that had precompiled executables, libraries, and configuration files, and placed those files wherever the maker of the binary package decided they should go.
It's exceedingly rare for a Windows user to compile his own copy of MySQL. If you're running MySQL on Windows, feel free to download your copy from the MySQL web site and skip this discussion.
Alternatively, for any number of reasons, you might have decided to compile the MySQL binaries on your own, by downloading a source tarball and configuring the installation to best meet your needs. However, don't do so lightly. Compiling from source has led to countless hours of pain for some users, mostly due to subtle bugs in their compilers or thread libraries. For this very reason, the standard binaries provided by MySQL AB are statically linked. That means they are immune to any bugs in your locally installed libraries.
There aren't too many places where the issue of "binary versus compiled-from-source" will come into play in the average MySQL tuning regimen, but they do happen. For example, in Chapter 10, our advice on chrooting your installation can be used only if every file MySQL needs is brought into a single directory tree, which might not be the case in a binary installation.
For a novice administrator on a simple installation, we recommend using a binary package (such as an RPM) to set up your system. However, once you progress to the point of really needing to tinker with the "guts" of MySQL, you will probably want to quickly go back, change a configure flag, and recompile.
One thing to keep in mind is that there are a number of sources for binary packages, and nearly all of them set up the system differently.
For example, you can download the binary installation from the MySQL.com web site. You can also nstall the binary distribution included by your Linux distribution vendor, or the one you grabbed from the FreeBSD
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Configuration Files
Configuring a MySQL server is often just a matter of editing the configuration file to make any changes you need and then restarting the server. While that sounds rather simple, adjusting the server's configuration is something you're not likely to do on a daily basis. More likely, you've installed MySQL, configured it minimally or with the defaults, and then let it run. Most users never go back and adjust the server configuration until a problem arises. As a result, it's easy to forget how to configure MySQL.
Another possibility is that you didn't even know there was a configuration file for MySQL. For the majority of projects, MySQL's default configuration is more than sufficient on modern hardware. It may not be as fast as it can be (because you haven't optimized it), but it will certainly meet your basic needs.
When MySQL starts, it reads its configuration files in a particular order, unless told otherwise. On Unix, the order is:
  1. /etc/my.cnf
  2. datadir/my.cnf
  3. ~/.my.cnf
On Windows, the order:
  1. %SystemRoot%/my.ini
  2. C:\my.cnf
Three command-line arguments affect how MySQL reads its configuration files:
--no-defaults
Tells MySQL not to read any configuration files.
--defaults-file=/path/to/file
Tells MySQL to read this file only, and any other files explicitly declared with
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The SHOW Commands
MySQL users often wonder how to find out what their server is actually doing at any point in time—usually when things start to slow down or behave strangely. You can look at operating system statistics to figure out how busy the server is, but that really doesn't reveal much. Knowing that the CPU is at 100% utilization or that there's a lot of disk I/O occurring provides a high-level picture of what is going on, but MySQL can tell far more.
Several SHOW commands provide a window into what's going on inside MySQL. They provide access to MySQL's configuration variables, ongoing statistics, and counters, as well as a description of what each client is doing.
The easiest way to verify that configuration changes have taken effect is to ask MySQL for its current variable settings. The SHOW VARIABLES command does just that. Executing it produces quite a bit of output, which looks something like this:
mysql> SHOW VARIABLES;

+---------------------------------+------------------------------------------+
| Variable_name                   | Value                                    |
+---------------------------------+------------------------------------------+
| back_log                        | 20                                       |
| basedir                         | mysql                                    |
| binlog_cache_size               | 32768                                    |
| character_set                   | latin1                                   |
| concurrent_insert               | ON                                       |
| connect_timeout                 | 5                                        |
| datadir                         | /home/mysql/data/                        |
The output continues from there, covering over 120 variables in total. The variables are listed in alphabetical order, which is convenient for reading, but sometimes related variables aren't anywhere near each other in the output. The reason for this is because as MySQL evolves, new variables are added with more descriptive names, but the older variable names aren't changed; it would break compatibility for any program that expects them.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Storage Engines (Table Types)
One powerful aspect of MySQL that sets it apart from nearly every other database server is that it offers users many choices and options depending upon the user's environment. From the server point of view, its default configuration can be changed to run well on a wide range of hardware. At the application development level, you have a variety of data types to choose from when creating tables to store records. But what's even more unusual is that you can choose the type of table in which the records will be stored. You can even mix and match tables of different types in the same database!
Storage engines used to be called table types. From time to time we refer to them as table types when it's less awkward to do so.
In this chapter, we'll show the major differences between the storage engines and why those differences are important. We'll begin with a look at locking and concurrency as well as transactions—two concepts that are critical to understanding some of the major differences between the various engines. Then we'll discuss the process of selecting the right one for your applications. Finally, we'll look deeper into each of the storage engines and get a feel for their features, storage formats, strengths and weaknesses, limitations, and so on.
Before drilling down into the details, there are a few general concepts we need to cover because they apply across all the storage engines. Some aren't even specific to MySQL at all; they're classic computer science problems that just happen to occur frequently in the world of multiuser database servers.
It will greatly aid your thinking about storage engines and the capabilities they bring to MySQL if you have a good mental picture of where they fit. Figure 2-1 provides a logical view of MySQL. It doesn't necessarily reflect the low-level implementation, which is bound to be more complicated and less clear cut. However, it does serve as a guide that will help you understand how storage engines fit in to MySQL. (The NDB storage engine was added to MySQL just before this book was printed. Watch for it in the second edition.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
MySQL Architecture
It will greatly aid your thinking about storage engines and the capabilities they bring to MySQL if you have a good mental picture of where they fit. Figure 2-1 provides a logical view of MySQL. It doesn't necessarily reflect the low-level implementation, which is bound to be more complicated and less clear cut. However, it does serve as a guide that will help you understand how storage engines fit in to MySQL. (The NDB storage engine was added to MySQL just before this book was printed. Watch for it in the second edition.)
Figure 2-1: A logical view of MySQL's architecture
The topmost layer is composed of the services that aren't unique to MySQL. They're services most network-based client/server tools or servers need: connection handling, authentication, security, etc.
The second layer is where things get interesting. Much of the brains inside MySQL live here, including query parsing, analysis, optimization, caching, and all the built-in functions (dates, times, math, encryption, etc.). Any functionality provided across storage engines lives at this level. Stored procedures, which will arrive in MySQL 5.0, also reside in this layer.
The third layer is made up of storage engines. They're responsible for the storage and retrieval of all data stored "in" MySQL. Like the various filesystems available for Linux, each storage engine has its own benefits and drawbacks. The good news is that many of the differences are transparent at the query layer.
The interface between the second and third layers is a single API not specific to any given storage engine. This API is made up of roughly 20 low-level functions that perform operations such as "begin a transaction" or "fetch the row that has this primary key" and so on. The storage engines don't deal with SQL or communicate with each other; they simply respond to requests from the higher levels within MySQL.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Locking and Concurrency
The first of those problems is how to deal with concurrency and locking. In any data repository you have to be careful when more than one person, process, or client needs to change data at the same time. Consider, for example, a classic email box on a Unix system. The popular mbox file format is incredibly simple. Email messages are simply concatenated together, one after another. This simple format makes it very easy to read and parse mail messages. It also makes mail delivery easy: just append a new message to the end of the file.
But what happens when two processes try to deliver messages at the same time to the same mailbox? Clearly that can corrupt the mailbox, leaving two interleaved messages at the end of the mailbox file. To prevent corruption, all well-behaved mail delivery systems implement a form of locking to prevent simultaneous delivery from occurring. If a second delivery is attempted while the mailbox is locked, the second process must wait until it can acquire the lock before delivering the message.
This scheme works reasonably well in practice, but it provides rather poor concurrency. Since only a single program may make any changes to the mailbox at any given time, it becomes problematic with a high-volume mailbox, one that receives thousands of messages per minute. This exclusive locking makes it difficult for mail delivery not to become backlogged if someone attempts to read, respond to, and delete messages in that same mailbox. Luckily, few mailboxes are actually that busy.
Reading from the mailbox isn't as troublesome. There's nothing wrong with multiple clients reading the same mailbox simultaneously. Since they aren't making changes, nothing is likely to go wrong. But what happens if someone tries to delete message number 25 while programs are reading the mailbox? It depends. A reader could come away with a corrupted or inconsistent view of the mailbox. So to be safe, even reading from a mailbox requires special care.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Transactions
You can't examine the more advanced features of a database system for very long before transactions enter the mix. A transaction is a group of SQL queries that are treated atomically, as a single unit of work. Either the entire group of queries is applied to a database, or none of them are. Little of this section is specific to MySQL. If you're already familiar with ACID transactions, feel free to skip ahead to the section "Transactions in MySQL."
A banking application is the classic example of why transactions are necessary. Imagine a bank's database with a two tables: checking and savings. To move $200 from Jane's checking account to her savings account, you need to perform at least three steps:
  1. Make sure her checking account balance is greater than $200.
  2. Subtract $200 from her checking account balance.
  3. Add $200 to her savings account balance.
The entire operation should be wrapped in a transaction so that if any one of the steps fails, they can all be rolled back.
A transaction is initiated (or opened) with the BEGIN statement and applied with COMMIT or rolled back (undone) with ROLLBACK . So the SQL for the transaction might look like this:
         BEGIN;
[step 1] SELECT balance FROM checking WHERE customer_id = 10233276;
[step 2] UPDATE checking SET balance = balance - 200.00 WHERE customer_id = 10233276;
[step 3] UPDATE savings  SET balance = balance + 200.00 WHERE customer_id = 10233276;
         COMMIT;
But transactions alone aren't the whole story. What happens if the database server crashes while performing step 3? Who knows? The customer probably just lost $200. What if another process comes along between Steps 2 and 3 and removes the entire checking account balance? The bank has given the customer a $200 credit without even knowing it.
Simply having transactions isn't sufficient unless the database server passes what is known as the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Selecting the Right Engine
When designing MySQL-based applications, you should decide which engine to use for storing your data. If you don't think about it during the design phase, you will likely face complications later in the process. You might find that the default engine doesn't provide a feature you need, such as transactions. Or maybe the mix of read and write queries your application generates will require more granular locking than MyISAM's table locks.
Because you can make the choice on a table-by-table basis, you'll need a clear idea of how each table is used and the data it stores. Of course, it also helps to have a good understanding of the application as a whole and its potential for growth. Armed with this information, you can begin to make good choices about which table engines can do the job.
While there are many factors that can affect your decision, it usually boils down to just a few considerations: transactions and concurrency, backups, and special features.

Section 2.4.1.1: Transactions and concurrency

When it comes to transactions and concurrency, consider the following guidelines:
  • If your application requires transactions and high read/write concurrency, InnoDB is probably your best bet.
  • If your application requires transactions but only moderate read/write concurrency, either BDB or InnoDB tables should work fine.
  • If your application doesn't require transactions and issues primarily SELECT or primarily INSERT/UPDATE queries, MyISAM is a good choice. Many web applications fall into this category.

Section 2.4.1.2: Backups

The need to perform regular backups may also influence your table choices. If your server can be shut down at regular intervals for backups, the storage engines are equally easy to deal with. However, if you need to perform online backups in one form or another, the choices become less clear. Chapter 9 deals with this topic in more detail.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Storage Engines
Now it's time to look at each of MySQL's storage engines in more detail. Table 2-5 summarizes some of the high-level characteristics of the handlers. The following sections provide some basic highlights and background about each table handler as well as any unusual characteristics and interesting features.
Before going further, it's worth noting that this isn't an exhaustive discussion of MySQL's storage engines. We assume that you've read (or at least know where to find) the information in the MySQL Reference Manual.
Table 2-5: Storage engine features in MySQL
Attribute
MyISAM
Heap
BDB
InnoDB
Transactions
No
No
Yes
Yes
Lock granularity
Table
Table
Page (8 KB)
Row
Storage
Split files
In-memory
Single file per table
Tablespace(s)
Isolation levels
None
None
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Benchmarking
We decided to cover benchmarking very early in this book because it's a critically important skill. Much of this book focuses on information and techniques you need to keep MySQL fast or make it run even faster. You need a good performance testing framework to judge the difference between one configuration and another, one query and another, or even one server and another. You also need a lot of patience and a willingness to experiment. This chapter can't give you all the answers, but we try to provide some tools that will help you find them.
If you care about database performance in your applications (and if you're reading this book, you probably do), benchmarking needs to become part of your development testing process. When you're testing an upgrade to MySQL or some MySQL configuration changes, run the benchmark tests you developed while building the application. Look at the results. Make sure they don't surprise you.
This chapter isn't long, but it contains essential material that we'll refer back to and apply in future chapters. If you're planning to skip around in the book, be sure to read this chapter first.
We begin with a look at the importance of benchmarking in database applications, then continue with a look at benchmarking strategies—things you need to think about in the planning process. Finally we get our hands dirty with a look at benchmarking tools.
We'll build on the strategies and tools presented in this chapter in those that follow. When considering performance questions, we'll consider the factors involved and present a benchmark test that can assist in the decision-making process. Take some time now to experiment with the tools and examples presented here. The skills you build now will benefit you in later chapters and in your own projects.
Benchmarking is fundamentally a "what if" game. By setting up a simple test, you can quickly answer questions such as the following:
  • What if I increase the number of rows by a factor of 10? Will my queries still be fast?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Importance of Benchmarking
Benchmarking is fundamentally a "what if" game. By setting up a simple test, you can quickly answer questions such as the following:
  • What if I increase the number of rows by a factor of 10? Will my queries still be fast?
  • Will a RAM upgrade really help? If so, how much?
  • Is the new server really twice as fast as the old one?
  • What if I disable the query cache?
  • Which is faster, using a subquery or two shorter queries?
  • What happens when this query is run multiple times or is run with other queries?
Benchmarking is often about comparisons. When deciding to make an important change, you'll want first to test the alternative(s) and then decide what to do based on the results of the test.
Our goal is to make benchmarking MySQL easy. Anytime you catch yourself wondering if A is faster than B, or whether A or B uses more memory, just pull out your favorite benchmarking tool and find out. Sometimes you'll be surprised by the results. To achieve the goal of easy MySQL benchmarking, we've tried to document how to use the available tools.
Beyond answering what-if questions, benchmarking is especially important in database-driven applications because it can highlight problems that are otherwise difficult to pinpoint. When an application slows down, the database may not be the first suspect. After spending a lot of time testing the application code, you'll eventually need to isolate the database to see whether it is a significant bottleneck. Having a prebuilt benchmark makes that task trivial.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Benchmarking Strategies
We'll look at the mechanics of benchmarking shortly. First it's important to convey some of strategies and ideas that make up the philosophy behind benchmarking.
To start with, it's important to make a distinction between performance testing and stress testing. Both processes use the tools we'll look at in this chapter, but the goals are very different. When doing performance testing, you're usually comparing two alternatives—most often in isolation from everything else. For instance, would it be faster to use a UNION or run two separate queries? Stress testing, on the other hand, is about finding limits: what's the maximum number of requests I can handle with this configuration?
If the two types of benchmarking still sound similar, look at it this way: in performance testing, the numbers you get aren't as important as the difference between them. You may see that alternative #1 usually runs in 0.01 seconds (or 100 queries/second), while alternative #2 runs in 0.20 seconds (or 5 queries/second). That tells you the first alternative is 20 times faster than the second one. However, knowing that you can handle 100 queries per second doesn't tell you how your application as a whole will perform unless, of course, your application always runs the same query. In contrast, stress testing can help in situations such as: "We expect the promotion we just offered to bring in 30% more hits than we have now. What will the effects on our server be?"
To make benchmarking as realistic and hassle-free as possible, here are several suggestions to consider:
Change one thing at a time
In science this is called isolating the variable. No matter how well you think you understand the effects your changes will have, don't make more than one change between test runs. Otherwise you'll never know which one was responsible for the doubling (or halving) of performance. You might be surprised to find that an adjustment you made once before to improve performance actually makes it worse in your current tests.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Benchmarking Tools
In this chapter we'll introduce three useful benchmarking tools:
  • The MySQL Benchmark Suite, which is useful for making comparisons between different database engines or different installations of one database engine. It isn't meant to benchmark your site-specific data or needs.
  • MySQL super-smack, a stress-testing tool.
  • MyBench, a tool developed in Perl by one of the authors. It is another stress-testing tool that is easier to customize and extend than super-smack.
The benchmark tools presented in this chapter may not run under Windows due to the lack of a Perl interpreter or binaries compiled for Windows. Because versions of Perl for Windows are readily (and freely) available from ActiveState, there's a good chance MyBench may work. However, neither of the authors use Windows, and we have not tried to confirm this.
However, these tools do run on Linux and most Unix-like platforms and can be used to test remote servers. So you might run them on Linux or Solaris to remotely benchmark a Windows 2000 server running MySQL.
The MySQL distribution comes with a rather comprehensive set of generic tests that have been bundled together so you can run them as a group and examine the results. The tests will do little to help you figure out whether a configuration change will speed up your application. But they're very helpful when used as a high-level benchmark, meaning they provide a good overall indication of how well one server performs relative to another.
You can also run the tests individually if you'd like compare a subset of the results from several servers. If you're mainly interested in UPDATE speed, run one of the UPDATE-intensive tests a few times on each server.
The benchmark suite can be used to test non-MySQL servers as well. According to the README, PostgreSQL, Solid, and mSQL have been tested. This may be helpful if you're trying to choose between MySQL and PostgreSQL. All the benchmark code is relatively generic Perl using the DBI and Benchmark modules. If needed, you can add support for nearly any database server that has a DBI driver (Oracle, Sybase, Informix, DB2, etc.). If you do so, be sure to look at the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Indexes
Indexes allow MySQL to quickly find and retrieve a set of records from the millions or even billions that a table may contain. If you've been using MySQL for any length of time, you've probably created indexes in the hopes of getting lighting-quick answers to your queries. And you've probably been surprised to find that MySQL didn't always use the index you thought it would.
For many users, indexes are something of a black art. Sometimes they work wonders, and other times they seem just to slow down inserts and get in the way. And then there are the times when they work fine for a while, then begin to slowly degrade.
In this chapter, we'll begin by looking at some of the concepts behind indexing and the various types of indexes MySQL provides. From there, we'll cover some of the specifics in MySQL's implementation of indexes. The chapter concludes with recommendations for selecting columns to index and the longer term care and feeding of your indexes.
To understand how MySQL uses indexes, it's best first to understand the basic workings and features of indexes. Once you have a basic understanding of their characteristics, you can start to make more intelligent choices about the right way to use them.
To understand what indexes allow MySQL to do, it's best to think about how MySQL works to answer a query. Imagine that phone_book is a table containing an aggregate phone book for the state of California, with roughly 35 million entries. And keep in mind that records within tables aren't inherently sorted. Consider a query like this one:
SELECT * FROM phone_book WHERE last_name = 'Zawodny'
Without any sort of index to consult, MySQL must read all the records in the phone_book table and compare the last_name field with the string "Zawodny" to see whether they match. Clearly that's not efficient. As the number of records increases, so does the effort necessary to find a given record. In computer science, we call that an O(n) problem.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Indexing Basics
To understand how MySQL uses indexes, it's best first to understand the basic workings and features of indexes. Once you have a basic understanding of their characteristics, you can start to make more intelligent choices about the right way to use them.
To understand what indexes allow MySQL to do, it's best to think about how MySQL works to answer a query. Imagine that phone_book is a table containing an aggregate phone book for the state of California, with roughly 35 million entries. And keep in mind that records within tables aren't inherently sorted. Consider a query like this one:
SELECT * FROM phone_book WHERE last_name = 'Zawodny'
Without any sort of index to consult, MySQL must read all the records in the phone_book table and compare the last_name field with the string "Zawodny" to see whether they match. Clearly that's not efficient. As the number of records increases, so does the effort necessary to find a given record. In computer science, we call that an O(n) problem.
But given a real phone book, we all know how to quickly locate anyone named Zawodny: flip to the Zs at the back of book and start there. Since the second letter is "a," we know that any matches will be at or near the front of the list of all names starting with Z. The method used is based on knowledge of the data and how it is sorted.
That's cheating, isn't it? Not at all. The reason you can find the Zawodnys so quickly is that they're sorted alphabetically by last name. So it's easy to find them, provided you know your ABCs, of course.
Most technical books (like this one) provide an index at the back. It allows you to find the location of important terms and concepts quickly because they're listed in sorted order along with the corresponding page numbers. Need to know where mysqlhotcopy is discussed? Just look up the page number in the index.
Database indexes are similar. Just as the book author or publisher may choose to create an index of the important concepts and terms in the book, you can choose to create an index on a particular column of a database table. Using the previous example, you might create an index on the last name to make looking up phone numbers faster:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Index Structures
Having covered some of the basic ideas behind indexing, let's turn to the various types (or structures) of indexes in MySQL. None of the index types are specific to MySQL. You'll find similar indexes in PostgreSQL, DB2, Oracle, etc.
Rather than focus too much on the implementation details, we'll look at the types of data or applications each type was designed to handle and find answers to questions like these: Which index types are the fastest? Most flexible? Use the most or least space?
If this were a general-purpose textbook for a computer science class, we might delve deeper into the specific data structures and algorithms that are employed under the hood. Instead, we'll try to limit our scope to the practical. If you're especially curious about the under-the-hood magic, there are plenty of excellent computer science books available on the topic.
The B-tree, or balanced tree, is the most common types of index. Virtually all database servers and embedded database libraries offer B-tree indexes, often as the default index type. They are usually the default because of their unique combination of flexibility, size, and overall good performance.
As the name implies, a B-tree is a tree structure. The nodes are arranged in sorted order based on the key values. A B-tree is said to be balanced because it will never become lopsided as new nodes are added and removed. The main benefit of this balance is that the worst-case performance of a B-tree is always quite good. B-trees offer O(log n) performance for single-record lookups. Unlike binary trees, in which each node has at most two children, B-trees have many keys per node and don't grow "tall" or "deep" as quickly as a binary tree.
B-tree indexes offer a lot of flexibility when you need to resolve queries. Range-base queries such as the following can be resolved very quickly:
SELECT * FROM phone_book WHERE last_name
BETWEEN 'Marten' and 'Mason'
The server simply finds the first "Marten" record and the last "Mason" record. It then knows that everything in between are also matches. The same is true of virtually any query that involves understanding the range of values, including
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Indexes and Table Types
Now that we have discussed the common index types, terminology, and uses in relatively generic terms so far, let's look at the indexes implemented in each of MySQL's storage engines. Each engine implements a subset of the three index types we've looked at. They also provide different optimizations that you should be aware of.
MySQL's default table type provides B-tree indexes, and as of Version 4.1.0, it provides R-tree indexes for spatial data. In addition to the standard benefits that come with a good B-tree implementation, MyISAM adds two other important but relatively unknown features prefix compression and packed keys.
Prefix compression is used to factor out common prefixes in string keys. In a table that stores URLs, it would be a waste of space for MySQL to store the "http://" in every node of the B-tree. Because it is common to large number of the keys, it will compress the common prefix so that it takes significantly less space.
Packed keys are best thought of as prefix compression for integer keys. Because integer keys are stored with their high bytes first, it's common for a large group of keys to share a common prefix because the highest bits of the number change far less often. To enable packed keys, simply append:
PACK_KEYS = 1
to the CREATE TABLE statement.
MySQL stores the indexes for a table in the table's .MYI file.

Section 4.3.1.1: Delayed key writes

One performance-enhancing feature of MyISAM tables is the ability to delay the writing of index data to disk. Normally, MySQL will flush modified key blocks to disk immediately after making changes to them, but you can override this behavior on a per-table basis or globally. Doing so provides a significant performance boost during heavy INSERT, UPDATE, and DELETE activity.
MySQL's delay_key_write
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Index Maintenance
Once you're done adding and dropping indexes, and your application is running happily, you may wonder about any ongoing index maintenance and administrative tasks. The good news is that there's no requirement that you do anything special, but there are a couple of things you may want to do from time to time.
If you're ever asked to help debug a slow query or indexing problem against a table (or group of tables) that you haven't seen in quite a while, you'll need to recover some basic information. Which columns are indexed? How many values are there? How large is the index?
Luckily, MySQL makes it relatively easy to gather this information. By using SHOW CREATE TABLE, you can retrieve the complete SQL necessary to (re-)create the table. However, if you care only about indexes, SHOW INDEXES FROM provides a lot more information.
mysql> SHOW INDEXES FROM access_jeremy_zawodny_com \G
*************************** 1. row ***************************
       Table: access_jeremy_zawodny_com
  Non_unique: 1
    Key_name: time_stamp
Seq_in_index: 1
 Column_name: time_stamp
   Collation: A
 Cardinality: 9434851
    Sub_part: NULL
      Packed: NULL
        Null: YES
  Index_type: BTREE
     Comment:

1 rows in set (0.00 sec)
You may substitute KEYS for INDEXES in the query.
The table in the example has a single index named time_stamp. It is a B-tree index with only one component, the time_stamp column (as opposed to a multicolumn index). The index isn't packed and is allowed to contain NULL values. It's a non-unique index, so duplicates are allowed.
Over time, a table that sees many changes is likely to develop some inefficiencies in its indexes. Fragmentation due to blocks moving around on disk and inaccurate index statistics are the two most common problems you're likely to see. Luckily, it's easy for MySQL to optimize index data for MyISAM tables.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: Query Performance
This chapter deals with an issue faced by every MySQL user sooner or later: speeding up slow queries. MySQL is a very fast database server, but its innate speed can carry your applications only so far. Eventually you need to roll up your sleeves, get your hands dirty, and figure out why your queries are slow—and ultimately figure out what needs to be done to get a response quickly.
We're frequently asked how we "figure this stuff out." It's really quite simple. Once you start to understand how MySQL does what it does, you'll begin to have an intuitive feeling for it, and query optimization will start to seem really easy. It's not always that easy, but with the proper background, you should end up able to figure out most optimization problems.
This chapter aims to provide a framework for understanding how MySQL works to resolve queries. With this foundation, you can continue through this chapter to the next, where the knowledge is applied to application design and server performance tuning.
We'll begin with an overview of how MySQL handles query processing. After that, we'll look at the optimizer's built-in features. Then we'll discuss identifying slow queries and finish up with a look at some of the hints you can provide to MySQL's query optimizer.
How MySQL goes from receiving a query to sending the results back to a client is relatively straightforward. The work happens in several distinct stages. Let's walk through them.
You can enable the query cache (available as of MySQL 4.0.1) by setting query_cache_type to an appropriate value in my.cnf:
query_cache_type = 1
MySQL attempts to locate the results of any SELECT query in the query cache before bothering to analyze or execute it. It does this by hashing the query and using the hashed value to check for the results in the cache. MySQL uses the exact query text it receives, so the cache is sensitive to the most trivial variations.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Query Processing Basics
How MySQL goes from receiving a query to sending the results back to a client is relatively straightforward. The work happens in several distinct stages. Let's walk through them.
You can enable the query cache (available as of MySQL 4.0.1) by setting query_cache_type to an appropriate value in my.cnf:
query_cache_type = 1
MySQL attempts to locate the results of any SELECT query in the query cache before bothering to analyze or execute it. It does this by hashing the query and using the hashed value to check for the results in the cache. MySQL uses the exact query text it receives, so the cache is sensitive to the most trivial variations.
As far as the cache is concerned, the query:
SELECT * FROM table1
is different from:
select * FROM table1
The same goes for variations in whitespace. MySQL doesn't trim extra space from the beginning or end of queries. This is rarely a problem because most repetitive queries are generated by applications rather than humans sitting at a keyboard.
To save some effort, MySQL cheats a bit. It only bothers to hash SELECT queries, since they're the only ones it makes any sense to cache. Unfortunately, older 4.0 versions of MySQL don't consider every SELECT query. The logic it uses simply checks the first three characters of your query, looking for SEL in a case-insensitive way.
As a result of this three-character "tunnel vision," any time you introduce whitespace or anything else at the beginning of the query, MySQL won't bother with the query cache. This can be a real problem in some applications. We know of a feed-processing system in which the developers uses comments to embed extra information at the beginning of each query:
/* <b>GetLatestStuff</b> */ SELECT * FROM sometable WHERE ...
The comment made is easier to identify the queries in an administrative tool that grabs the output of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Optimizer Features and Oddities
When testing queries, always remember to use realistic data. A common source of problems with MySQL is the query optimizer's handling of test data. It often does surprising things. If you don't know what it's doing and why (and it rarely tells you why), you may spend a lot of time tracking down a problem that really isn't there. Or, worse yet, you may embarrass yourself asking about it on the MySQL mailing list, only to learn that you've created the problem all on your own.
In general, MySQL uses an index when it is reasonably confident that doing so is more efficient than not doing so. This leads to false negatives during testing. The false negative tends to occur in the two situations that we'll now investigate.
Even if you have a lot of data (thousands of rows or more), MySQL may choose to ignore your indexes some of the time if your data doesn't have sufficient diversity. Why might that happen? Imagine you have a table that contains historical climate data for most world cities:
CREATE TABLE weather
(
  city       VARCHAR(100) NOT NULL,
  high_temp  TINYINT      NOT NULL,
  low_temp   TINYINT      NOT NULL,
  the_date   DATE         NOT NULL,
  INDEX (city),
  INDEX (the_date),
)
Rather than loading all two million records, you load two years worth of data (1980 and 1981) to test. After some testing, you find that queries that need to access many of the records are using full table scans rather than the the_date index. For example, to find the average high temperature in 1980, you might write something like this:
SELECT AVG(high_temp) FROM weather
WHERE the_date BETWEEN '1980-01-01' AND '1980-12-31';
Having data from only 1980 and 1981 loaded, that query needs to examine 50% of the rows in the weather table. In such a case, MySQL decides that it is faster to simply scan the entire table.
How does it know? When you cross a certain threshold, it is slower to locate rows using an index than to read them sequentially. For MySQL, the cutoff point is roughly 30%. The number is chosen by the MySQL developers based on their extensive experience (and knowledge of the code) and is subject to change from release to release. The actual number is specific to each storage engine: InnoDB has a different threshold than MyISAM tables, and so forth.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Identifying Slow Queries
Figuring out which queries are slow is usually easier than figuring out why they're slow and making the necessary changes to fix them. The easiest way to track them is to let MySQL do some of the work for you. By enabling the slow query log , you instruct MySQL to log every query that takes longer than a specified number of seconds. In addition to the query, it also logs some other metadata.
Here's an example record from a slow query log:
# Time: 030303  0:51:27
# User@Host: user[user] @ client.example.com [192.168.50.12]
# Query_time: 25  Lock_time: 0  Rows_sent: 3949  Rows_examined: 378036
select ArticleHtmlFiles.SourceTag, ArticleHtmlFiles.AuxId from ArticleHtmlFiles left 
join Headlines on ArticleHtmlFiles.SourceTag = Headlines.SourceTag and 
ArticleHtmlFiles.AuxId = Headlines.AuxId where Headlines.AuxId is NULL;
While the log contains a lot of useful information, there's one very important bit of information missing: an idea of why the query was slow. Sure, if the log says 12,000,000 rows were examined and 1,200,000 sent to the client, you know why it was slow. But things are rarely that clear cut. Worse yet, you may find a slow query, paste it into your favorite MySQL client, and find that it executes in a fraction of a second.
You must be careful not to read too much information into the slow query log. When a query appears in the log, it doesn't mean that it's a bad query—or even a slow one. It simply means that the query took a long time then. It doesn't mean that the query will take a long time now or in the future.
There are any number of reasons why a query may be slow at one time but not at others:
  • A table may have been locked, causing the query to wait. The Lock_time indicates how long the query waited for locks to be released.
  • None of the data or indexes may have been cached in memory yet. This is common when MySQL is first started or hasn't been well tuned. Chapter 4 covers this in more detail.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Influencing MySQL with Hints
Many relational database servers implement some notion of hints—a simple syntax for providing additional information to the underlying SQL engine and query optimizer. Sometimes you may need to do this to work around a bug or improve performance. Let's have a quick look at the various hints that can influence MySQL's query processing. As you saw with the query cache, hints in MySQL often appear right after the SELECT keyword:
SELECT SQL_CACHE * FROM mytable ...
But as you'll see, that's not always the case.
If you're worried about code portability because your SQL may need to run on a database server other than MySQL, you can often enclose hints within comments so that they'll be ignored by other servers—or older versions of MySQL itself:
SELECT /*! SQL_CACHE */ * FROM mytable ...
MySQL normally doesn't care about the order in which you list tables in your queries. It examines the possibilities and decides which table to read first, second, and so on. Once in a while, you might find that MySQL isn't handling a multitable join very well. After looking at the EXPLAIN output for the query, you realize that it's accessing the tables in a less than optimal order.
If you think you can do a better job of optimizing the join order than MySQL has done, you can use the STRAIGHT_JOIN hint in place of a comma or JOIN keyword in your query:
SELECT * FROM table1 STRAIGHT_JOIN table2 WHERE ...
Doing so forces MySQL to join the tables in the order they appear in your query, regardless of the order it would otherwise decide to use.
MySQL provides several index-related hints to cover cases when you'd like more control over the indexes it considers.
To provide a list of indexes you'd like MySQL to consider, ignoring all others, add USE INDEX after the table name in the query:
SELECT * FROM mytable USE INDEX (mod_time, name) ...
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Stupid Query Tricks
We can't end a chapter on query optimization without looking at some common tricks that can increase performance of some queries. While these are all rather specific, you may find techniques that can be applied in other circumstances.
Sometimes MySQL doesn't optimize a seemingly simple query the way you'd expect. A good example of this behavior occurred in a database used to track historical stock prices. There are two tables involved: SymbolHistory and Symbols.
As far as we're concerned, the Symbols table contains two important fields: Id and Symbol. The Id is an auto_increment primary key. Here's the PriceHistory table:
mysql> DESCRIBE PriceHistory;
+----------+---------+------+-----+------------+-------+
| Field    | Type    | Null | Key | Default    | Extra |
+----------+---------+------+-----+------------+-------+
| SymbolID | int(11) |      | PRI | 0          |       |
| Date     | date    |      | PRI | 0000-00-00 |       |
| Open     | float   |      |     | 0          |       |
| High     | float   |      |     | 0          |       |
| Low      | float   |      |     | 0          |       |
| Close    | float   |      |     | 0          |       |
| Volume   | float   |      |     | 0          |       |
+----------+---------+------+-----+------------+-------+
8 rows in set (0.01 sec)
It has a two-part index on (SymbolID, Date).
The Symbols table maps stock tickers to numeric identifiers. It also contains various other bits of metadata about each security. The PriceHistory table contains the historical price data. One of the most common queries run against the data is, "Show me all closing prices for a given stock sorted from newest to oldest."
To fetch the price history for IBM, the query looks like this:
mysql> EXPLAIN SELECT date_format(Date,'%Y%m%d') as Day, Close
    -> FROM Symbols, PriceHistory
    -> WHERE Symbols.ID=PriceHistory.SymbolID AND Symbols.Symbol = 'ibm'
    -> ORDER BY Date DESC \G
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 6: Server Performance Tuning
The operating system your MySQL server runs on and the server's configuration can be just as important to your server's performance as the indexes, schema, or queries themselves. In this chapter, we will help you understand how to tune your server to improve performance, as opposed to tuning schema or queries. We'll be looking at changes to your hardware, operating system, and MySQL configuration to see what effects they have on overall performance.
We assume that you've already made efforts to boost the performance of your queries. If you haven't done that already, stop now and read Chapter 4 and Chapter 5 to get a handle on optimizing your queries and your application code. Only then should you worry about server settings. Hardware is often not the solution to MySQL performance problems. Poorly optimized queries can slow you down far more than not having the latest CPU or SCSI disk. To put this in perspective, one of the MySQL AB trainers even says that changing hardware might, in the best cases, give you a 10-fold performance increase. But tuning queries (and schemas) can often give you 1000-fold performance increase. Seriously.
Some topics covered in this chapter are platform-specific. The authors' knowledge of the various platforms on which MySQL runs is limited. In many cases, you'll need to consult your local documentation for various operating system tools and specifics.
We start with an overview of the factors that limit performance and then look more in depth at RAID, hardware, and operating system issues. The chapter finishes with a discussion of techniques you can use to locate, identify, and fix bottlenecks.
Before we can begin to think about what to adjust on a busy MySQL server, it's best to get an understanding of the various factors that affect performance and, most importantly, how they can affect it. One of the single biggest problems that most MySQL users face is simply not understanding how to go about finding bottlenecks.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Performance-Limiting Factors