One common issue with SQL applications is performance regression. In this section, I’ll describe some basic actions to take when you hit performance problems. Don’t worry about the details; just pick up the essential ideas. As your knowledge deepens, you’ll find yourself using them in a more educated and therefore more effective manner.
When I considered which problems to include in this chapter, I questioned whether I should mention performance problems at all. There are a lot of sources describing performance issues in detail, starting with the wonderful “Optimization” chapter in the MySQL Reference Manual and extending to books published by O’Reilly. I will add a short overview of useful sources at the end of this book. One can easily spend one’s career on this subject or drown in the flood of available information.
I will be speaking here mostly about SELECT
queries. At the end of this section, I
briefly address how to deal with a slow query that modifies data.
Three main techniques will help you work on slow queries: tuning the query itself, tuning your tables (including adding indexes), and tuning the server. Let’s look at them in detail.
The most powerful tool for query tuning is our old
acquaintance EXPLAIN
. This statement
provides information about how the server actually executes a query.
Details of MySQL EXPLAIN
are covered
quite well in the MySQL Reference Manual, and I will not repeat
that information here. Rather, I’ll pick out what I’ve found to be the
most important and useful elements of the output.
The first lines you should look at are type
, which actually shows the type of
join performed, and rows
,
which shows an estimate of how many rows were examined during the query.
(For instance, if the query had to scan the whole table, the number of
rows would equal the number of rows in the table.) Multitable joins have
to examine a Cartesian product of the number of rows examined in each
table. Thus, if the query examines 20 rows in one table and 30 in
another, the join performs a total of 600 examinations. EXPLAIN
will contain a row for each table in the JOIN
. We
will see this in the following examples.
EXPLAIN
reports a join even when you operate on a single table. This may
sound a bit strange, but the MySQL optimizer internally treats any query
as a join, even if it’s a join on one table.
Let’s look into the EXPLAIN
output from the previous section again:
mysql> EXPLAIN EXTENDED SELECT count(*) FROM items WHERE id IN
(SELECT iid FROM items_links)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: items
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 10
filtered: 100.00
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: items_links
type: index_subquery
possible_keys: iid,iid_2
key: iid
key_len: 5
ref: func
rows: 1
filtered: 100.00
Extra: Using index; Using where
2 rows in set, 1 warning (0.48 sec)
The number of examined rows is 10 times 1 because the subquery
executes once for every row in the outer query. The type of the first
query is index
, which means that the
whole index will be read. The type of the second query is index_subquery
. This is an index lookup
function that works similar to the ref
type. So, in this example, the optimizer
will read all index entries from the items
table and one row from the items_links
table for each of the ten matching
rows found from items
.
How can we find out if this is a good plan for the query? First, let’s repeat the query results and check how long the query actually took:
mysql> SELECT count(*) FROM items WHERE id IN (SELECT iid FROM items_links);
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.08 sec)
The MySQL server examined 10 rows and returned 4. How fast is it? To answer this question, let’s count the number of rows in both tables:
mysql>SELECT count(*) FROM items;
+----------+ | count(*) | +----------+ | 10 | +----------+ 1 row in set (0.11 sec) mysql>SELECT count(*) FROM items_links;
+----------+ | count(*) | +----------+ | 6 | +----------+ 1 row in set (0.00 sec)
We have 10 rows in the items
table, each with a unique ID. The items_links
table has 6 rows with nonunique
IDs (iid
). For the current amount of
rows, the plan looks good, but at the same time, it shows a potential
problem. Right now we have fewer links than items. Currently the
difference in these numbers is not high, but it will be more noticeable if their numbers
diverge.
To test this guess and to show you an example of query tuning, I
will insert a few rows into the items
table. The id
is defined as INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, so we
are guaranteed that no existing link will refer to a newly inserted row.
This allows us to imitate a realistic situation that arises when a user
needs to get a small number of links (six in our case) from a large
table. The following statements are just a quick hack to create a lot of
rows by repeatedly selecting all the rows in the table and inserting
more:
mysql>INSERT INTO items( short_description , description, example, explanation, additional) SELECT short_description , description, example, explanation, additional FROM items;
Query OK, 10 rows affected (0.17 sec) Records: 10 Duplicates: 0 Warnings: 0 <Repeat this query few times> mysql>INSERT INTO items( short_description , description, example, explanation, additional) SELECT short_description , description, example, explanation, additional FROM items;
Query OK, 2560 rows affected (3.77 sec) Records: 2560 Duplicates: 0 Warnings: 0
Now let’s see whether our query plan changed:
mysql>EXPLAIN EXTENDED SELECT count(*) FROM items WHERE id IN
->(SELECT iid FROM items_links)\G
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: items type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 5136 filtered: 100.00 Extra: Using where; Using index *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: items_links type: index_subquery possible_keys: iid,iid_2 key: iid key_len: 5 ref: func rows: 1 filtered: 100.00 Extra: Using index; Using where 2 rows in set, 1 warning (0.09 sec)
The query execution plan is the same—and it is going to examine 5,136 rows for just six links! Is there any way to rewrite the query so it will execute faster?
The type of subquery is index_subquery
. This means that the optimizer uses an index lookup function
that replaces the subquery completely. The output from SHOW
WARNINGS
shows how the query has been
rewritten:
mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select count(0) AS `count(*)` from `collaborate2011`.`items` where
<in_optimizer>(`collaborate2011`.`items`.`id`,<exists>
(<index_lookup>(<cache>(`collaborate2011`.`items`.`id`) in
items_links on iid where (<cache>(`collaborate2011`.`items`.`id`) =
`collaborate2011`.`items_links`.`iid`))))
1 row in set (0.00 sec)
The output is intimidating, but at least we can see some kind of
join here. What if we rewrite the query to be more explicit about the
columns on which the join is performed? We will also rewrite the
subquery into an explicit JOIN
; with
current versions of MySQL, this method can dramatically improve
performance:
mysql>\W
Show warnings enabled. mysql>EXPLAIN EXTENDED SELECT count(*) FROM items JOIN items_links ON (items.id=items_links.iid)\G
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: items_links type: index possible_keys: iid,iid_2 key: iid_2 key_len: 5 ref: NULL rows: 6 filtered: 100.00 Extra: Using index *************************** 2. row *************************** id: 1 select_type: SIMPLE table: items type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: collaborate2011.items_links.iid rows: 1 filtered: 100.00 Extra: Using index 2 rows in set, 1 warning (0.05 sec) Note (Code 1003): select count(0) AS `count(*)` from `collaborate2011`.`items` join `collaborate2011`.`items_links` where (`collaborate2011`.`items`.`id` = `collaborate2011`.`items_links`.`iid`)
The result looks encouraging because it does not scan all the rows
from the items
table. But does the
query work correctly?
mysql> SELECT count(*) FROM items JOIN items_links ON
(items.id=items_links.iid);
+----------+
| count(*) |
+----------+
| 6 |
+----------+
1 row in set (0.10 sec)
We get six rows instead of four. This is because we asked the
query to return all matches, even when the same link was in two matches.
We can fix this by adding the DISTINCT
keyword:
mysql> SELECT count(distinct items.id) FROM items JOIN items_links ON (items.id=items_links.iid); +--------------------------+ | count(distinct items.id) | +--------------------------+ | 4 | +--------------------------+ 1 row in set (0.12 sec)
Note
You can use a query rewriting technique to confirm that DISTINCT
is needed. Just replace count(*)
with items.id
to see the duplicate values.
With DISTINCT
, is the query
still fast? Let’s try EXPLAIN
once
again:
mysql> EXPLAIN EXTENDED SELECT count(distinct items.id) FROM items
JOIN items_links ON (items.id=items_links.iid)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: items_links
type: index
possible_keys: iid,iid_2
key: iid_2
key_len: 5
ref: NULL
rows: 6
filtered: 100.00
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: items
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: collaborate2011.items_links.iid
rows: 1
filtered: 100.00
Extra: Using index
2 rows in set, 1 warning (0.00 sec)
Note (Code 1003): select count(distinct `collaborate2011`.`items`.`id`) AS
`count(distinct items.id)` from `collaborate2011`.`items` join
`collaborate2011`.`items_links` where (`collaborate2011`.`items`.`id` =
`collaborate2011`.`items_links`.`iid`)
It still examines six rows. So we can consider the query to be optimized for this particular data set. I will explain later in this chapter why the structure of the data and its size matter.
In our example, the data set was small, so I could not make it run really slowly, even on my laptop. Still, the execution times of the original and optimized queries differ substantially. This was the original query:
mysql> SELECT count(*) FROM items WHERE id IN (SELECT iid FROM
items_links );
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.21 sec)
And here is the optimized query:
mysql> SELECT count(distinct items.id) FROM items JOIN items_links
ON (items.id=items_links.iid);
+--------------------------+
| count(distinct items.id) |
+--------------------------+
| 4 |
+--------------------------+
1 row in set (0.10 sec)
We achieved a two-fold improvement, even for such a small data set! For our test, this is just 0.11 sec, but for millions of rows, the improvement can be way better.
In the previous section, we introduced the process of tuning
queries. In all the examples, EXPLAIN
output contained information about indexes. But what if a table has no
index at all? Or if the indexes are not being used? How should you
choose when, where, and which index to add?
The MySQL server uses indexes when results can be limited.
Therefore, having indexes on columns listed in WHERE
, JOIN
, and GROUP
BY
can speed up queries. Having an index on the column in an
ORDER BY
clause can make sense as
well because it will allow the server to do more effective
sorting.
With those principles in mind, adding indexes becomes a simple task. Consider tables from the previous example, but without any indexes:
mysql>CREATE TEMPORARY TABLE items SELECT * FROM items;
Query OK, 5120 rows affected (6.97 sec) Records: 5120 Duplicates: 0 Warnings: 0 mysql>CREATE TEMPORARY TABLE items_links SELECT * FROM items_links;
Query OK, 6 rows affected (0.36 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql>SHOW CREATE TABLE items;
+-------+----------------------------------------------------------------------+ | Table | Create Table | +-------+----------------------------------------------------------------------+ | items | CREATE TEMPORARY TABLE `items` ( `id` int(11) NOT NULL DEFAULT '0', `short_description` varchar(255) DEFAULT NULL, `description` text, `example` text, `explanation` text, `additional` text ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+----------------------------------------------------------------------+ 1 row in set (0.10 sec) mysql>SHOW CREATE TABLE items_links;
+-------------+----------------------------------------------------------------+ | Table | Create Table | +-------------+----------------------------------------------------------------+ | items_links | CREATE TEMPORARY TABLE `items_links` ( `iid` int(11) DEFAULT NULL, `linkid` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------------+----------------------------------------------------------------+ 1 row in set (0.00 sec)
As you can see, no index is specified. Let’s try an unoptimized query on these tables, and then improve it:
mysql> EXPLAIN EXTENDED SELECT count(distinct items.id) FROM items JOIN
items_links ON (items.id=items_links.iid)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: items_links
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 6
filtered: 100.00
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: items
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5137
filtered: 100.00
Extra: Using where; Using join buffer
2 rows in set, 1 warning (0.00 sec)
Note (Code 1003): select count(distinct `collaborate2011`.`items`.`id`) AS
`count(distinct items.id)` from `collaborate2011`.`items` join
`collaborate2011`.`items_links` where (`collaborate2011`.`items`.`id` =
`collaborate2011`.`items_links`.`iid`)
The type became ALL
, the most
expensive type, because it means all rows will be read. The query is
examining 6*5,137 = 30,822 rows now. This is even worse than the query
we considered to be slow in our earlier example.
Let’s examine the query in detail:
SELECT count(distinct items.id)...
This query returns the number of unique not-null values in the
result set. It would make sense to add an index on items.id
, so that this search could use
it.
Another part of the same query:
...FROM items JOIN items_links ON (items.id=items_links.iid)
The join refers to the id
column from items
and the iid
column from items_links
. So it makes sense to add indexes
to both of those columns.
mysql>ALTER TABLE items ADD INDEX(id);
Query OK, 5120 rows affected (4.78 sec) Records: 5120 Duplicates: 0 Warnings: 0 mysql>ALTER TABLE items_links ADD INDEX(iid);
Query OK, 6 rows affected (0.04 sec) Records: 6 Duplicates: 0 Warnings: 0
Now we can see how this affects the query plan:
mysql> EXPLAIN EXTENDED SELECT count(distinct items.id) FROM items
JOIN items_links ON (items.id=items_links.iid)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: items_links
type: index
possible_keys: iid
key: iid
key_len: 5
ref: NULL
rows: 6
filtered: 100.00
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: items
type: ref
possible_keys: id
key: id
key_len: 4
ref: collaborate2011.items_links.iid
rows: 1
filtered: 100.00
Extra: Using index
2 rows in set, 1 warning (0.00 sec)
Note (Code 1003): select count(distinct `collaborate2011`.`items`.`id`) AS
`count(distinct items.id)` from `collaborate2011`.`items` join
`collaborate2011`.`items_links` where (`collaborate2011`.`items`.`id` =
`collaborate2011`.`items_links`.`iid`)
This looks much better than before, with a single exception: the
ref
type for table items is worse
than the eq_ref
we got in the
previous section. This type is used because we added a simple index,
whereas the original table had a unique index on the same column. We can easily change the temporary
table too, because IDs are unique and are supposed to be so:
mysql> EXPLAIN EXTENDED SELECT count(distinct items.id) FROM items
JOIN items_links ON (items.id=items_links.iid)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: items_links
type: index
possible_keys: iid
key: iid
key_len: 5
ref: NULL
rows: 6
filtered: 100.00
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: items
type: eq_ref
possible_keys: id_2,id
key: id_2
key_len: 4
ref: collaborate2011.items_links.iid
rows: 1
filtered: 100.00
Extra: Using index
2 rows in set, 1 warning (0.00 sec)
Note (Code 1003): select count(distinct `collaborate2011`.`items`.`id`) AS
`count(distinct items.id)` from `collaborate2011`.`items` join
`collaborate2011`.`items_links` where (`collaborate2011`.`items`.`id` =
`collaborate2011`.`items_links`.`iid`)
Now, when the faster-executing type eq_ref
is used, we can drop the redundant
index on items.id
. This is especially
important if you care about the speed of queries that modify data
because updating each index takes time. We will discuss when query
tuning should be stopped in
following section.
You just learned how indexes affect query execution and when it makes sense to add them.
The previous sections discussed simple queries. Even there we
found ways to improve the queries, sometimes with increasingly better
results over several iterations of tuning. When you work with
complicated queries with a lot of JOIN
conditions, a lot of columns in WHERE
clauses, and GROUP BY
clauses, you have even more choices.
It’s possible to imagine that you could always find a way to make
performance even faster and that these improvements can continue
forever. So the question is when to consider a query properly optimized
and put a stop to the research.
An in-depth knowledge of performance techniques could help you choose the proper solution. But there are still basic considerations that can help you to stop, even if you don’t consider yourself an expert.
First, you need to find out what the query does. For example, the following query:
SELECT * FROM contacts
always returns all columns and rows from the table, and no optimization can be applied to it.
But even if you extract all columns, adding a JOIN
can change the situation:
SELECT * FROM child JOIN parent ON (child.pid=parent.id)
This can be optimized because the ON
condition limits the result set. The same
analysis can be applied to queries with WHERE
and GROUP
BY
conditions.
The second thing you need to look at is the join type from the
EXPLAIN
output. Although you will be
trying to get the best possible JOIN
types, keep in mind the limitations of your data. For example, a
condition on a nonunique row can never lead to types eq_ref
or better.
Your data is very important when you optimize queries. Differences in data can lead to completely different results for the same execution plan. The most trivial example is to compare results for a single row in a table to results for a table where more than 50% of the rows have the same value. In these cases, using indexes can decrease performance rather than increase it.
Here is another rule: do not rely only on the
EXPLAIN
output; make sure to measure the actual query execution time.
Another thing you should keep in mind is the effect of indexes on
changes to the table. Although indexes usually improve the speed of
SELECT
queries, they slightly
decrease the speed of queries that modify data, especially INSERT
statements.
Therefore, it can sometimes be sensible to live with slow SELECT
queries if it speeds up the execution
of inserts. Always keep in mind the overall performance of your
application, not just a single query.
Suppose you have completely optimized your query and can’t find any ways to tune it better, but it’s still slow. Can you do anything to improve its performance? Yes. There are server options that allow you to tune factors affecting the query, such as the size of temporary tables in memory, buffers for sorting, and so on. Some options specific to a particular storage engine, such as InnoDB, can also be useful for query optimizing.
I will describe these configuration options in more detail in Chapter 3. Here I’ll give an overview of how to use them to improve performance.
Tuning server options is a somewhat global activity because a change can potentially affect every query on the server (or in the case of engine-specific options, every query that refers to a table using that storage engine). But some options are used for particular kinds of optimization, and if your query does not meet the right conditions, it remains unaffected.
The first options to check are buffer sizes. Each memory buffer is allocated for specific reasons. The general rule is that large buffers mean higher performance—but only if the query can use the larger size for the particular role played by that buffer.
And of course there are trade-offs when you increase buffer sizes. Here are some of the issues large buffers can cause. I don’t want to dissuade you from setting large buffers, because under the right circumstances it’s a great way to improve performance significantly. You just need to keep the following issues in mind and adjust sizes reasonably.
- Swapping
A large buffer may lead to swapping at the operating-system level and therefore slow performance, depending on the size of RAM on your system. In general, the MySQL server works fast if all the memory it needs sits in physical RAM. When it starts swapping, performance degrades dramatically.
Swapping can happen when you allocate more memory to buffers than your server has physically in RAM. Please note that some buffers are allocated for each user thread. To determine how much memory the server allocates for such buffers, use the formula max_connections * buffer_size. Calculate the sum of this product for all buffers, and make sure it is less than the amount of memory the mysqld server can use. This calculation is not decisive, because mysqld can actually allocate more memory than you explicitly specify.
- Startup time
The more memory mysqld needs to allocate, the longer it takes to start.
- Stale data
There are also scaling issues, mostly for caches shared between threads. Scaling the buffers that do the caching in these cases can lead to memory fragmentation. You will generally notice the fragmentation problem after hours of running the server, when old data needs to be removed from the buffer in order to make room for new data. This can cause a fast server to suddenly slow down. I show this in an example in Chapter 3.
After finishing with buffers, consider other options discussed in Chapter 3. Pay attention not only to options explicitly designated for performance tuning, such as optimizer options, but also to options that control high availability. The safer you make a transaction, the more checks are needed and the more slowly the query executes. But be careful with such options; tune them when and only when you can sacrifice safety for performance.
When you tune options, it is especially important to consider performance as a whole, because every option affects the whole server. For example, there is no sense in tuning engine-specific options if you don’t use that engine. This might appear too obvious, but I have seen way too many installations where MyISAM options were huge when only the InnoDB storage engine was used, or vice versa. This is especially important to remember if you take some generic configuration as a template.
The MySQL server allows you to change most of its configuration dynamically. This is valuable for performance testing. Change options and rerun queries to make sure you get good results before applying the changes in the configuration file. It is also always a good idea to apply changes step by step, so that you can attribute bad effects to the right option and easily go back. We will discuss this technique in detail in Chapter 6.
We discussed effects on the performance of SELECT
queries, and in this section, we turn
to tuning queries that modify data. UPDATE
and DELETE
queries can use the same conditions as
SELECT
to limit the number of rows
affected. Therefore, the same rules for tuning can be applied to these
queries.
We saw in When the Problem May Have Been a Previous Update how to convert
UPDATE
and DELETE
queries to SELECT
queries and run EXPLAIN
on them. You can use this technique to
troubleshoot performance issues on versions prior to 5.6.3, which
introduced EXPLAIN
for
INSERT
, UPDATE
,
and DELETE
queries, but remember that
UPDATE
and DELETE
are sometimes executed slightly
differently from the corresponding SELECT
.
Check whether indexes were used by querying the Handler_%
status variables before and after
query execution:
mysql> SHOW STATUS LIKE 'Handler_%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 0 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 0 |
| Handler_read_last | 0 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 19 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 17 |
+----------------------------+-------+
16 rows in set (0.00 sec)
We’ll talk about particular variables in the preceding list as we continue. You should be aware, though, that these are cumulative values, so they increase as you issue each query. Now let’s tune our example query from When the Problem May Have Been a Previous Update so that it will update null-able columns:
mysql> UPDATE items SET description = 'no description', additional
= 'no additional comments' WHERE description IS NULL;
Query OK, 0 rows affected (6.95 sec)
Rows matched: 0 Changed: 0 Warnings: 0
This changed no rows because we corrupted the data in an earlier
step: we have 0 in each field instead of NULL
now. But the query runs very slowly.
Let’s look at the handler variables:
mysql> show status like 'Handler_%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 2 |
| Handler_read_last | 0 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 5140 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 17 |
+----------------------------+-------+
16 rows in set (0.01 sec)
What jumps out is the high value of Handler_read_rnd_next
, which shows how often
the next row in a datafile was read. A high value typically means that a
table scan was used, which is not good for performance. Handler_read_key
is a related variable showing
the number of index read requests. It shouldn’t be so low in relation to
Handler_read_rnd_next
, because that means
a lot of rows were read instead of using an index. In addition, the
values of Handler_commit
and Handler_read_first
have increased slightly.
These refer respectively to the number of commits and the number of
times the first entry in an index was read. Finally, the unobtrusive 1
in Handler_read_first
shows that we
asked the server to read a first entry in the index, which can be a
symptom of a full index scan.
Hopefully, this run-through of a few Handler_%
status variables has shown you how
they can be used to check how queries are running. I’ll leave the
question of whether the speed of this query can be improved as homework
for you.
I’ll just spend a little space on INSERT
queries. They have no conditions that
limit the set of rows affected, so the presence of indexes in a table
merely slows them down because each insert has to update the indexes.
The performance of inserts should be tuned using server options. Here
especially, the options offered by the InnoDB storage engine can be
helpful.
One way to speed up inserts is to combine many of them in one statement, also called “bulk insert”:
insert into t1 (f1, f2, f3, ...) values (v1, v2, v3, ...), (v1, v2, v3, ...), ...
But please note that inserts block table rows or even whole tables, so other queries are denied access while the insert runs. So I’ll end this section by repeating the general rule:
We just learned how tuning server options can dramatically improve performance. We’ve also seen in this chapter how to tune a particular query so it runs really fast. Tuning a query and tuning a server are often alternative solutions to performance problems. Is there any general rule about which kind of tuning you should start with?
I am afraid not. Tuning server options looks so promising that many people think finding and changing the right option will make mysqld run like a rocket. If you believe that too, I have to disappoint you: badly written queries will still be a drain on server resources. And you may enjoy good performance for a few hours after restarting the server only to see it decrease again, because every query will need a lot of resources and your caches will fill up. At times, the server will be inundated with millions of queries that want more and more resources.
However, tuning every single query might not be an option. Some of them are called rarely, so there is no need to spend human resources on them. Others may query all the rows in a table, which defeats attempts to optimize them.
I usually advocate some kind of “mixed” mode. Tune server options first, paying particular attention to options specific to your storage engine, then tune queries. After the important queries are tuned, go back to the server options and consider what you can tune more, then back to the rest of the queries, and so on, until you are happy with performance.
You can also start with the slowest queries in your application or find ones that can obviously benefit from trivial optimizations, then turn to server options. Consult the status variables, as shown earlier. I will describe them in more detail in Chapter 6.
Last but not least: use numerous information sources on performance tuning to create your own strategy.
Get MySQL Troubleshooting now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.