Slow Queries

One common issue with SQL applications is performance regression. In this section, I’ll describe some basic actions to take when you hit performance problems. Don’t worry about the details; just pick up the essential ideas. As your knowledge deepens, you’ll find yourself using them in a more educated and therefore more effective manner.

When I considered which problems to include in this chapter, I questioned whether I should mention performance problems at all. There are a lot of sources describing performance issues in detail, starting with the wonderful “Optimization” chapter in the MySQL Reference Manual and extending to books published by O’Reilly. I will add a short overview of useful sources at the end of this book. One can easily spend one’s career on this subject or drown in the flood of available information.

I will be speaking here mostly about SELECT queries. At the end of this section, I briefly address how to deal with a slow query that modifies data.

Three main techniques will help you work on slow queries: tuning the query itself, tuning your tables (including adding indexes), and tuning the server. Let’s look at them in detail.

Tuning a Query with Information from EXPLAIN

The most powerful tool for query tuning is our old acquaintance EXPLAIN. This statement provides information about how the server actually executes a query. Details of MySQL EXPLAIN are covered quite well in the MySQL Reference Manual, and I will not repeat that information here. Rather, I’ll pick out what I’ve found to be the most important and useful elements of the output.

The first lines you should look at are type, which actually shows the type of join performed, and rows, which shows an estimate of how many rows were examined during the query. (For instance, if the query had to scan the whole table, the number of rows would equal the number of rows in the table.) Multitable joins have to examine a Cartesian product of the number of rows examined in each table. Thus, if the query examines 20 rows in one table and 30 in another, the join performs a total of 600 examinations. EXPLAIN will contain a row for each table in the JOIN. We will see this in the following examples.

EXPLAIN reports a join even when you operate on a single table. This may sound a bit strange, but the MySQL optimizer internally treats any query as a join, even if it’s a join on one table.

Let’s look into the EXPLAIN output from the previous section again:

mysql> EXPLAIN EXTENDED SELECT count(*) FROM items WHERE id IN
(SELECT iid FROM items_links)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: items
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
     filtered: 100.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: items_links
         type: index_subquery
possible_keys: iid,iid_2
          key: iid
      key_len: 5
          ref: func
         rows: 1
     filtered: 100.00
        Extra: Using index; Using where
2 rows in set, 1 warning (0.48 sec)

The number of examined rows is 10 times 1 because the subquery executes once for every row in the outer query. The type of the first query is index, which means that the whole index will be read. The type of the second query is index_subquery. This is an index lookup function that works similar to the ref type. So, in this example, the optimizer will read all index entries from the items table and one row from the items_links table for each of the ten matching rows found from items.

How can we find out if this is a good plan for the query? First, let’s repeat the query results and check how long the query actually took:

mysql> SELECT count(*) FROM items WHERE id IN (SELECT iid FROM items_links);
+----------+
| count(*) |
+----------+
|        4 |
+----------+
1 row in set (0.08 sec)

The MySQL server examined 10 rows and returned 4. How fast is it? To answer this question, let’s count the number of rows in both tables:

mysql> SELECT count(*) FROM items;
+----------+
| count(*) |
+----------+
|       10 |
+----------+
1 row in set (0.11 sec)

mysql> SELECT count(*) FROM items_links;
+----------+
| count(*) |
+----------+
|        6 |
+----------+
1 row in set (0.00 sec)

We have 10 rows in the items table, each with a unique ID. The items_links table has 6 rows with nonunique IDs (iid). For the current amount of rows, the plan looks good, but at the same time, it shows a potential problem. Right now we have fewer links than items. Currently the difference in these numbers is not high, but it will be more noticeable if their numbers diverge.

To test this guess and to show you an example of query tuning, I will insert a few rows into the items table. The id is defined as INT NOT NULL AUTO_INCREMENT PRIMARY KEY, so we are guaranteed that no existing link will refer to a newly inserted row. This allows us to imitate a realistic situation that arises when a user needs to get a small number of links (six in our case) from a large table. The following statements are just a quick hack to create a lot of rows by repeatedly selecting all the rows in the table and inserting more:

mysql> INSERT INTO items( short_description , description,
example, explanation, additional) SELECT  short_description , description,
example, explanation, additional FROM items;
Query OK, 10 rows affected (0.17 sec)
Records: 10  Duplicates: 0  Warnings: 0
<Repeat this query few times>
mysql> INSERT INTO items( short_description , description,
example, explanation, additional) SELECT  short_description , description,
example, explanation, additional FROM items;
Query OK, 2560 rows affected (3.77 sec)
Records: 2560  Duplicates: 0  Warnings: 0

Now let’s see whether our query plan changed:

mysql> EXPLAIN EXTENDED SELECT count(*) FROM items WHERE id IN
    -> (SELECT iid FROM items_links)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: items
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 5136
     filtered: 100.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: items_links
         type: index_subquery
possible_keys: iid,iid_2
          key: iid
      key_len: 5
          ref: func
         rows: 1
     filtered: 100.00
        Extra: Using index; Using where
2 rows in set, 1 warning (0.09 sec)

The query execution plan is the same—and it is going to examine 5,136 rows for just six links! Is there any way to rewrite the query so it will execute faster?

The type of subquery is index_subquery. This means that the optimizer uses an index lookup function that replaces the subquery completely. The output from SHOW WARNINGS shows how the query has been rewritten:

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select count(0) AS `count(*)` from `collaborate2011`.`items` where
<in_optimizer>(`collaborate2011`.`items`.`id`,<exists>
(<index_lookup>(<cache>(`collaborate2011`.`items`.`id`) in
items_links on iid where (<cache>(`collaborate2011`.`items`.`id`) =
`collaborate2011`.`items_links`.`iid`))))
1 row in set (0.00 sec)

The output is intimidating, but at least we can see some kind of join here. What if we rewrite the query to be more explicit about the columns on which the join is performed? We will also rewrite the subquery into an explicit JOIN; with current versions of MySQL, this method can dramatically improve performance:

mysql> \W
Show warnings enabled.
mysql> EXPLAIN EXTENDED SELECT count(*) FROM items JOIN
items_links ON (items.id=items_links.iid)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: items_links
         type: index
possible_keys: iid,iid_2
          key: iid_2
      key_len: 5
          ref: NULL
         rows: 6
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: items
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: collaborate2011.items_links.iid
         rows: 1
     filtered: 100.00
        Extra: Using index
2 rows in set, 1 warning (0.05 sec)

Note (Code 1003): select count(0) AS `count(*)` from `collaborate2011`.`items`
join `collaborate2011`.`items_links` where (`collaborate2011`.`items`.`id` =
`collaborate2011`.`items_links`.`iid`)

The result looks encouraging because it does not scan all the rows from the items table. But does the query work correctly?

mysql> SELECT count(*) FROM items JOIN items_links ON
(items.id=items_links.iid);
+----------+
| count(*) |
+----------+
|        6 |
+----------+
1 row in set (0.10 sec)

We get six rows instead of four. This is because we asked the query to return all matches, even when the same link was in two matches. We can fix this by adding the DISTINCT keyword:

mysql> SELECT count(distinct items.id) FROM items JOIN items_links ON 
(items.id=items_links.iid);
+--------------------------+
| count(distinct items.id) |
+--------------------------+
|                        4 |
+--------------------------+
1 row in set (0.12 sec)

Note

You can use a query rewriting technique to confirm that DISTINCT is needed. Just replace count(*) with items.id to see the duplicate values.

With DISTINCT, is the query still fast? Let’s try EXPLAIN once again:

mysql> EXPLAIN EXTENDED SELECT count(distinct items.id) FROM items
JOIN items_links ON (items.id=items_links.iid)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: items_links
         type: index
possible_keys: iid,iid_2
          key: iid_2
      key_len: 5
          ref: NULL
         rows: 6
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: items
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: collaborate2011.items_links.iid
         rows: 1
     filtered: 100.00
        Extra: Using index
2 rows in set, 1 warning (0.00 sec)

Note (Code 1003): select count(distinct `collaborate2011`.`items`.`id`) AS
`count(distinct items.id)` from `collaborate2011`.`items` join
`collaborate2011`.`items_links` where (`collaborate2011`.`items`.`id` =
`collaborate2011`.`items_links`.`iid`)

It still examines six rows. So we can consider the query to be optimized for this particular data set. I will explain later in this chapter why the structure of the data and its size matter.

In our example, the data set was small, so I could not make it run really slowly, even on my laptop. Still, the execution times of the original and optimized queries differ substantially. This was the original query:

mysql> SELECT count(*) FROM items  WHERE id IN (SELECT iid FROM
items_links );
+----------+
| count(*) |
+----------+
|        4 |
+----------+
1 row in set (0.21 sec)

And here is the optimized query:

mysql> SELECT count(distinct items.id) FROM items JOIN items_links
ON (items.id=items_links.iid);
+--------------------------+
| count(distinct items.id) |
+--------------------------+
|                        4 |
+--------------------------+
1 row in set (0.10 sec)

We achieved a two-fold improvement, even for such a small data set! For our test, this is just 0.11 sec, but for millions of rows, the improvement can be way better.

You just learned a basic query tuning technique with the help of EXPLAIN: reading information about what currently happens and comparing it to what you wish to reach. A similar process can tune any query, from the simplest to the most complex.

Table Tuning and Indexes

In the previous section, we introduced the process of tuning queries. In all the examples, EXPLAIN output contained information about indexes. But what if a table has no index at all? Or if the indexes are not being used? How should you choose when, where, and which index to add?

The MySQL server uses indexes when results can be limited. Therefore, having indexes on columns listed in WHERE, JOIN, and GROUP BY can speed up queries. Having an index on the column in an ORDER BY clause can make sense as well because it will allow the server to do more effective sorting.

With those principles in mind, adding indexes becomes a simple task. Consider tables from the previous example, but without any indexes:

mysql> CREATE TEMPORARY TABLE items SELECT * FROM items;
Query OK, 5120 rows affected (6.97 sec)
Records: 5120  Duplicates: 0  Warnings: 0

mysql> CREATE TEMPORARY TABLE items_links SELECT * FROM items_links;
Query OK, 6 rows affected (0.36 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE items;
+-------+----------------------------------------------------------------------+
| Table | Create Table                                                         |
+-------+----------------------------------------------------------------------+
| items | CREATE TEMPORARY TABLE `items` (
  `id` int(11) NOT NULL DEFAULT '0',
  `short_description` varchar(255) DEFAULT NULL,
  `description` text,
  `example` text,
  `explanation` text,
  `additional` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+----------------------------------------------------------------------+
1 row in set (0.10 sec)

mysql> SHOW CREATE TABLE items_links;
+-------------+----------------------------------------------------------------+
| Table       | Create Table                                                   |
+-------------+----------------------------------------------------------------+
| items_links | CREATE TEMPORARY TABLE `items_links` (
  `iid` int(11) DEFAULT NULL,
  `linkid` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------------+----------------------------------------------------------------+
1 row in set (0.00 sec)

As you can see, no index is specified. Let’s try an unoptimized query on these tables, and then improve it:

mysql> EXPLAIN EXTENDED SELECT count(distinct items.id) FROM items JOIN
items_links ON (items.id=items_links.iid)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: items_links
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 6
     filtered: 100.00
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: items
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5137
     filtered: 100.00
        Extra: Using where; Using join buffer
2 rows in set, 1 warning (0.00 sec)

Note (Code 1003): select count(distinct `collaborate2011`.`items`.`id`) AS
`count(distinct items.id)` from `collaborate2011`.`items` join
`collaborate2011`.`items_links` where (`collaborate2011`.`items`.`id` =
`collaborate2011`.`items_links`.`iid`)

The type became ALL, the most expensive type, because it means all rows will be read. The query is examining 6*5,137 = 30,822 rows now. This is even worse than the query we considered to be slow in our earlier example.

Let’s examine the query in detail:

SELECT count(distinct items.id)...

This query returns the number of unique not-null values in the result set. It would make sense to add an index on items.id, so that this search could use it.

Another part of the same query:

...FROM items JOIN items_links ON (items.id=items_links.iid)

The join refers to the id column from items and the iid column from items_links. So it makes sense to add indexes to both of those columns.

mysql> ALTER TABLE items ADD INDEX(id);
Query OK, 5120 rows affected (4.78 sec)
Records: 5120  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE items_links ADD INDEX(iid);
Query OK, 6 rows affected (0.04 sec)
Records: 6  Duplicates: 0  Warnings: 0

Now we can see how this affects the query plan:

mysql> EXPLAIN EXTENDED SELECT count(distinct items.id) FROM items
JOIN items_links ON (items.id=items_links.iid)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: items_links
         type: index
possible_keys: iid
          key: iid
      key_len: 5
          ref: NULL
         rows: 6
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: items
         type: ref
possible_keys: id
          key: id
      key_len: 4
          ref: collaborate2011.items_links.iid
         rows: 1
     filtered: 100.00
        Extra: Using index
2 rows in set, 1 warning (0.00 sec)

Note (Code 1003): select count(distinct `collaborate2011`.`items`.`id`) AS
`count(distinct items.id)` from `collaborate2011`.`items` join
`collaborate2011`.`items_links` where (`collaborate2011`.`items`.`id` =
`collaborate2011`.`items_links`.`iid`)

This looks much better than before, with a single exception: the ref type for table items is worse than the eq_ref we got in the previous section. This type is used because we added a simple index, whereas the original table had a unique index on the same column. We can easily change the temporary table too, because IDs are unique and are supposed to be so:

mysql> EXPLAIN EXTENDED SELECT count(distinct items.id) FROM items
JOIN items_links ON (items.id=items_links.iid)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: items_links
         type: index
possible_keys: iid
          key: iid
      key_len: 5
          ref: NULL
         rows: 6
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: items
         type: eq_ref
possible_keys: id_2,id
          key: id_2
      key_len: 4
          ref: collaborate2011.items_links.iid
         rows: 1
     filtered: 100.00
        Extra: Using index
2 rows in set, 1 warning (0.00 sec)

Note (Code 1003): select count(distinct `collaborate2011`.`items`.`id`) AS
`count(distinct items.id)` from `collaborate2011`.`items` join
`collaborate2011`.`items_links` where (`collaborate2011`.`items`.`id` =
`collaborate2011`.`items_links`.`iid`)

Now, when the faster-executing type eq_ref is used, we can drop the redundant index on items.id. This is especially important if you care about the speed of queries that modify data because updating each index takes time. We will discuss when query tuning should be stopped in following section.

You just learned how indexes affect query execution and when it makes sense to add them.

Choosing Your Own Execution Plan

There are cases when an index actually slows down a query. In this case, you can drop the index or use an IGNORE INDEX clause if you need the index for other queries. You can also use FORCE INDEX to let the optimizer know which index you want to use. These clauses can also be very useful for query tuning when you want to see how a particular index can affect performance. Just try EXPLAIN using the clauses, and analyze the output.

Although use of IGNORE INDEX and FORCE INDEX may sound appealing, you should avoid using them in production when possible, unless you are ready to examine every query with such a clause during each following upgrade.

This is required because the optimizer always tries to choose the best plan for execution. Improving from version to version, it can create different plans for the same JOINs. When you don’t force or ignore an index, it will just create a plan as best as it can. But if you explicitly specify how it should use indexes in one of the tables from a multitable JOIN, such a rule can affect others, and the final execution plan could change in the new version to something worse than it was before.

Using IGNORE INDEX and FORCE INDEX in queries that access only one table is relatively safe. In all other cases, you must check to make sure the query execution plan was not changed during the upgrade.

Another problem with using IGNORE INDEX and FORCE INDEX in production is that the best execution plan for particular tables depends on the data they store. In normal operation, the optimizer checks table statistics and adjusts the plan as necessary, which it can’t do when you use IGNORE INDEX and FORCE INDEX. If you use these clauses, you must regularly check whether they are still effective throughout the application’s life.

When to Stop Optimizing

The previous sections discussed simple queries. Even there we found ways to improve the queries, sometimes with increasingly better results over several iterations of tuning. When you work with complicated queries with a lot of JOIN conditions, a lot of columns in WHERE clauses, and GROUP BY clauses, you have even more choices. It’s possible to imagine that you could always find a way to make performance even faster and that these improvements can continue forever. So the question is when to consider a query properly optimized and put a stop to the research.

An in-depth knowledge of performance techniques could help you choose the proper solution. But there are still basic considerations that can help you to stop, even if you don’t consider yourself an expert.

First, you need to find out what the query does. For example, the following query:

SELECT * FROM contacts

always returns all columns and rows from the table, and no optimization can be applied to it.

But even if you extract all columns, adding a JOIN can change the situation:

SELECT * FROM child JOIN parent ON (child.pid=parent.id)

This can be optimized because the ON condition limits the result set. The same analysis can be applied to queries with WHERE and GROUP BY conditions.

The second thing you need to look at is the join type from the EXPLAIN output. Although you will be trying to get the best possible JOIN types, keep in mind the limitations of your data. For example, a condition on a nonunique row can never lead to types eq_ref or better.

Your data is very important when you optimize queries. Differences in data can lead to completely different results for the same execution plan. The most trivial example is to compare results for a single row in a table to results for a table where more than 50% of the rows have the same value. In these cases, using indexes can decrease performance rather than increase it.

Here is another rule: do not rely only on the EXPLAIN output; make sure to measure the actual query execution time.

Another thing you should keep in mind is the effect of indexes on changes to the table. Although indexes usually improve the speed of SELECT queries, they slightly decrease the speed of queries that modify data, especially INSERT statements. Therefore, it can sometimes be sensible to live with slow SELECT queries if it speeds up the execution of inserts. Always keep in mind the overall performance of your application, not just a single query.

Effects of Options

Suppose you have completely optimized your query and can’t find any ways to tune it better, but it’s still slow. Can you do anything to improve its performance? Yes. There are server options that allow you to tune factors affecting the query, such as the size of temporary tables in memory, buffers for sorting, and so on. Some options specific to a particular storage engine, such as InnoDB, can also be useful for query optimizing.

I will describe these configuration options in more detail in Chapter 3. Here I’ll give an overview of how to use them to improve performance.

Tuning server options is a somewhat global activity because a change can potentially affect every query on the server (or in the case of engine-specific options, every query that refers to a table using that storage engine). But some options are used for particular kinds of optimization, and if your query does not meet the right conditions, it remains unaffected.

The first options to check are buffer sizes. Each memory buffer is allocated for specific reasons. The general rule is that large buffers mean higher performance—but only if the query can use the larger size for the particular role played by that buffer.

And of course there are trade-offs when you increase buffer sizes. Here are some of the issues large buffers can cause. I don’t want to dissuade you from setting large buffers, because under the right circumstances it’s a great way to improve performance significantly. You just need to keep the following issues in mind and adjust sizes reasonably.

Swapping

A large buffer may lead to swapping at the operating-system level and therefore slow performance, depending on the size of RAM on your system. In general, the MySQL server works fast if all the memory it needs sits in physical RAM. When it starts swapping, performance degrades dramatically.

Swapping can happen when you allocate more memory to buffers than your server has physically in RAM. Please note that some buffers are allocated for each user thread. To determine how much memory the server allocates for such buffers, use the formula max_connections * buffer_size. Calculate the sum of this product for all buffers, and make sure it is less than the amount of memory the mysqld server can use. This calculation is not decisive, because mysqld can actually allocate more memory than you explicitly specify.

Startup time

The more memory mysqld needs to allocate, the longer it takes to start.

Stale data

There are also scaling issues, mostly for caches shared between threads. Scaling the buffers that do the caching in these cases can lead to memory fragmentation. You will generally notice the fragmentation problem after hours of running the server, when old data needs to be removed from the buffer in order to make room for new data. This can cause a fast server to suddenly slow down. I show this in an example in Chapter 3.

After finishing with buffers, consider other options discussed in Chapter 3. Pay attention not only to options explicitly designated for performance tuning, such as optimizer options, but also to options that control high availability. The safer you make a transaction, the more checks are needed and the more slowly the query executes. But be careful with such options; tune them when and only when you can sacrifice safety for performance.

When you tune options, it is especially important to consider performance as a whole, because every option affects the whole server. For example, there is no sense in tuning engine-specific options if you don’t use that engine. This might appear too obvious, but I have seen way too many installations where MyISAM options were huge when only the InnoDB storage engine was used, or vice versa. This is especially important to remember if you take some generic configuration as a template.

The MySQL server allows you to change most of its configuration dynamically. This is valuable for performance testing. Change options and rerun queries to make sure you get good results before applying the changes in the configuration file. It is also always a good idea to apply changes step by step, so that you can attribute bad effects to the right option and easily go back. We will discuss this technique in detail in Chapter 6.

Queries That Modify Data

We discussed effects on the performance of SELECT queries, and in this section, we turn to tuning queries that modify data. UPDATE and DELETE queries can use the same conditions as SELECT to limit the number of rows affected. Therefore, the same rules for tuning can be applied to these queries.

We saw in When the Problem May Have Been a Previous Update how to convert UPDATE and DELETE queries to SELECT queries and run EXPLAIN on them. You can use this technique to troubleshoot performance issues on versions prior to 5.6.3, which introduced EXPLAIN for INSERT, UPDATE, and DELETE queries, but remember that UPDATE and DELETE are sometimes executed slightly differently from the corresponding SELECT.

Check whether indexes were used by querying the Handler_% status variables before and after query execution:

mysql> SHOW STATUS LIKE 'Handler_%'; 
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 0     |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 19    |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 17    |
+----------------------------+-------+
16 rows in set (0.00 sec)

: Here I used SHOW STATUS, which is a synonym of SHOW SESSION STATUS and shows the status variables for the current session.

Note

It is convenient to reset these variables before testing using a FLUSH STATUS query.

We’ll talk about particular variables in the preceding list as we continue. You should be aware, though, that these are cumulative values, so they increase as you issue each query. Now let’s tune our example query from When the Problem May Have Been a Previous Update so that it will update null-able columns:

mysql> UPDATE items SET description = 'no description', additional
= 'no additional comments' WHERE description IS NULL;
Query OK, 0 rows affected (6.95 sec)
Rows matched: 0  Changed: 0  Warnings: 0

This changed no rows because we corrupted the data in an earlier step: we have 0 in each field instead of NULL now. But the query runs very slowly. Let’s look at the handler variables:

mysql> show status like 'Handler_%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 2     |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 5140  |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 17    |
+----------------------------+-------+
16 rows in set (0.01 sec)

What jumps out is the high value of Handler_read_rnd_next, which shows how often the next row in a datafile was read. A high value typically means that a table scan was used, which is not good for performance. Handler_read_key is a related variable showing the number of index read requests. It shouldn’t be so low in relation to Handler_read_rnd_next, because that means a lot of rows were read instead of using an index. In addition, the values of Handler_commit and Handler_read_first have increased slightly. These refer respectively to the number of commits and the number of times the first entry in an index was read. Finally, the unobtrusive 1 in Handler_read_first shows that we asked the server to read a first entry in the index, which can be a symptom of a full index scan.

Hopefully, this run-through of a few Handler_% status variables has shown you how they can be used to check how queries are running. I’ll leave the question of whether the speed of this query can be improved as homework for you.

I’ll just spend a little space on INSERT queries. They have no conditions that limit the set of rows affected, so the presence of indexes in a table merely slows them down because each insert has to update the indexes. The performance of inserts should be tuned using server options. Here especially, the options offered by the InnoDB storage engine can be helpful.

One way to speed up inserts is to combine many of them in one statement, also called “bulk insert”:

insert into t1 (f1, f2, f3, ...) values (v1, v2, v3, ...), (v1, v2, v3, ...), ...

But please note that inserts block table rows or even whole tables, so other queries are denied access while the insert runs. So I’ll end this section by repeating the general rule:

Keep the performance of the whole application in mind while tuning any single query.

No Silver Bullet

We just learned how tuning server options can dramatically improve performance. We’ve also seen in this chapter how to tune a particular query so it runs really fast. Tuning a query and tuning a server are often alternative solutions to performance problems. Is there any general rule about which kind of tuning you should start with?

I am afraid not. Tuning server options looks so promising that many people think finding and changing the right option will make mysqld run like a rocket. If you believe that too, I have to disappoint you: badly written queries will still be a drain on server resources. And you may enjoy good performance for a few hours after restarting the server only to see it decrease again, because every query will need a lot of resources and your caches will fill up. At times, the server will be inundated with millions of queries that want more and more resources.

However, tuning every single query might not be an option. Some of them are called rarely, so there is no need to spend human resources on them. Others may query all the rows in a table, which defeats attempts to optimize them.

I usually advocate some kind of “mixed” mode. Tune server options first, paying particular attention to options specific to your storage engine, then tune queries. After the important queries are tuned, go back to the server options and consider what you can tune more, then back to the rest of the queries, and so on, until you are happy with performance.

You can also start with the slowest queries in your application or find ones that can obviously benefit from trivial optimizations, then turn to server options. Consult the status variables, as shown earlier. I will describe them in more detail in Chapter 6.

Last but not least: use numerous information sources on performance tuning to create your own strategy.

Get MySQL Troubleshooting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial