The MySQL server is not alone in the environment where it runs. Even if it works in a dedicated environment, you still have to consider the hardware resources and operating system (OS) limits. In shared environments, the MySQL server is also affected by other processes. Tuning operating systems for MySQL is a huge topic about which a separate book could be written. Here I won’t go deep, but show some starting points from a troubleshooting perspective. MySQL runs in various environments, which is a great advantage for MySQL, but also something that makes it hard to get specific in this chapter. So I decided to show you what you need to care about, and leave it up to you to consult the manual for your OS to determine how to handle tuning.
A common usage mistake is to have unrealistic expectations for performance. One can expect that the MySQL server needs to be tuned while forgetting the latency of hardware components. Therefore, it is important to understand what can cause the latency.
The following hardware resources affect the MySQL server:
Number of cores
Let’s discuss each of them in a bit of detail.
Memory is a very important resource for MySQL. The server works fast when it does not swap. Ideally, it should fit in RAM. Therefore, it is important to configure buffers in such a way that they stay within the limits of physical memory. I provided guidelines for this in Effects of Options and in Chapter 3, particularly in Calculating Safe Values for Options.
Here is an example of swapping on Linux. Important parts are in bold. For a server that is not swapping, all these values should be equal to zero:
---swap-------io---- -system-- ----cpu---- r b
swpdfree buff cache
si sobi bo in cs us sy id wa 1 0
1936296828 7524 5045340
4 11860 0 470 440 4 2 75 18 0 1
1936295928 7532 5046432
36 40860 768 471 455 3 3 75 19 0 1
1936294840 7532 5047564
4 12868 0 466 441 3 3 75 19 0 1
1936293752 7532 5048664
0 0848 0 461 434 5 2 75 18
In those sections, we discussed how configuration variables can affect memory usage. The basic rule is to calculate the maximum realistic amount of RAM that will be used by the MySQL server and to make sure you keep it less than the physical RAM you have. Having buffers larger than the actual memory size increases the risk of the MySQL server crashing with an “Out of memory” error.
The previous point can be stated the other way around: if you need larger buffers, buy more RAM. This is always a good practice for growing applications.
Use RAM modules that support extended error correction (EEC), so if a bit of memory is corrupted, the whole MySQL server does not crash.
A few other aspects of memory use that you should consider are listed in a chapter named “How MySQL Uses Memory” in the MySQL Reference Manual. I won’t repeat its contents here, because it doesn’t involve any new troubleshooting techniques.
One important point is that when you select a row containing
a BLOB column, an internal buffer grows to the point where it can store
this value, and the storage engine does not return the memory to RAM
after the query finishes. You need to run
TABLE to free the memory.
Another point concerns differences between 32-bit and 64-bit architectures. Although the 32-bit ones use a smaller pointer size and thus can save memory, these systems also contain inherent restrictions on the size of buffers due to addressing limits in the operating system. Theoretically, the maximum memory available in a 32-bit system is 4GB per process, and it’s actually less on many systems. Therefore, if the buffers you want to use exceed the size of your 32-bit system, consider switching to a 64-bit architecture.
MySQL’s performance does not scale linearly with increasing CPU speed. This does not mean you can’t make use of a fast CPU, but don’t expect performance will scale by increasing CPU speed in the same way it can increase by adding more RAM.
However, the number of cores is important when you set options that affect internal thread concurrency. There is no sense in increasing the values of such options if you don’t have enough cores. This can be easily demonstrated using the benchmark utility named sysbench. Table 4-1 shows the results of a small test on a machine with four cores. I used the OLTP sysbench test with 16 threads.
As you can see, the test runs faster up until I start eight threads, and stops improving at higher values.
Regarding disks, you should pay attention to disk read latency—how much time each read access takes—and fsync latency—how long each fsync takes.
Recent solid state disks (SSDs) work well, but don’t expect miracles from them yet, because most storage engines are optimized to do read and writes for hard disks.
The same issue applies to network storage. It is possible to store data and logfiles in network filesystems and storage, but these installations may be slower than local disks. You need to check is how fast and reliable your storage is. Otherwise, don’t be surprised if you experience data corruption because of network failure.
You can determine whether your disk I/O is overloaded using iostat on Linux/Unix. The average queue length of the requests issued to the device should not be high in normal use. On Windows you can use perfmon for the same purpose. Here is an example output of iostat:
iostat -x 5Linux 2.6.18-8.1.1.el5 (blade12) 11/11/2011 _x86_64_ avg-cpu: %user %nice %system %iowait %steal %idle 1.27 0.00 0.32 0.65 0.00 97.79 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await cciss/c0d0 0.02 7.25 0.50 6.34 14.35 108.73 17.98 0.48 69.58 dm-0 0.00 0.00 0.52 13.59 14.27 108.70 8.72 0.09 6.59 svctm %util 2.22 1.52 1.08 1.52 avg-cpu: %user %nice %system %iowait %steal %idle 38.69 0.00 6.43 47.84 0.00 8.64 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await cciss/c0d0 0.00 5362.40 0.20 713.80 1.60 51547.20 72.20 138.40 193.08 dm-0 0.00 0.00 0.00 6086.00 0.00 48688.00 8.00 1294.74 227.04 svctm %util 1.40 99.88 0.16 99.88 <skipped> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await cciss/c0d0 0.00 10781.80 0.00 570.20 0.00 84648.00 148.45 143.58 248.72 dm-0 0.00 0.00 0.00 11358.00 0.00 90864.00 8.00 3153.82 267.57 svctm %util 1.75 100.02 0.09 100.02 avg-cpu: %user %nice %system %iowait %steal %idle 8.90 0.00 11.30 75.10 0.00 5.60 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await cciss/c0d0 0.00 11722.40 0.00 461.60 0.00 98736.00 213.90 127.78 277.04 dm-0 0.00 0.00 0.00 12179.20 0.00 97433.60 8.00 3616.90 297.54 svctm %util 2.14 98.80 0.08 98.80 avg-cpu: %user %nice %system %iowait %steal %idle 23.55 0.00 23.95 46.19 0.00 7.11 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await cciss/c0d0 0.00 4836.80 1.00 713.60 8.00 49070.40 68.68 144.28 204.08 dm-0 0.00 0.00 1.00 5554.60 8.00 44436.80 8.00 1321.82 257.28 svctm %util 1.40 100.02 0.18 100.02
This output was taken when mysqld was idle and then started an active I/O
job. You can see how
growing. Although this is far from problematic, I decided to put this
example here to show how disk I/O activity changes while mysqld is doing its job.
And aside from speed, remember that the storage can lose data;
partial page writes are still possible. If you use InnoDB, use the
doublewrite buffer by setting
innodb_doublewrite to secure your data. Also,
is very important to plan battery backups for your disks because these
can prevent data loss in case of power failure.
In addition to network bandwidth, round-trip time (RTT) and the number of round-trips are important. RTT is a time needed for a client to send a network packet, then receive an answer from the server. The longer distance between the machines, the higher the RTT is.
Network bandwidth and RTT are the reasons why it is recommended to place the MySQL client and the server in the same local network when possible.
Local networks are recommended for replication as well. You can
connect to slaves over the Internet instead of a local intranet, but
expect delays and even errors due to corruption in the relay log data.
Such errors should be fixed automatically after bug #26489 is fixed
and if you use the
relay-log-recovery option starting
with version 5.5 and binary log checksums starting with version 5.6. But
the master and slave will still spend time resending packets due to
To finish this part of the chapter, I will show you small example
of how hardware latencies affect a trivial
UPDATE query. We will use an InnoDB table with
autocommit turned on. We will also
turn on the binary logfile.
UPDATE test_rrbs SET f1 = md5(id*2) WHERE id BETWEEN 200000 AND 300000;
This simple query could experience latencies when:
|The client sends a command to the server that takes a half of RTT.|
|mysqld does an |
|mysqld does an |
|mysqld does an |
|The client receives the result from the server, which is another aspect of RTT.|
Other limits on MySQL are set by operating systems. For
example, I saw a case when a server failed because the Linux
50. This option specifies the percentage of the system’s total
virtual memory that can be allocated via
malloc(); attempts to allocate more memory by a
process will fail. Fifty percent is the default value for many Linux
installations. So when
about 50% of existing RAM and then tries to allocate more, it fails. In a
multitasking setup, such an option may be useful because it protects other
critical processes from a greedy MySQL server, but it’s absolutely
ridiculous in a dedicated server.
Another important Unix option is
restricts various resources for users. When you set resources using the
command or another system utility, remember that the MySQL server runs as
a simple user and is subject to the same limits as everyone else.
Server-related options mentioned how OS restrictions affect the
open_files_limit variable, and I’ll show an
example here. Suppose the server’s
ulimit on open files (the
-n option) is set to 1024, the default
on many systems. If you try to start mysqld with
--open-files-limit=4096, it would not override
the operating system limit.
ulimit -n1024 $
./bin/mysqld --defaults-file=support-files/my-small.cnf --basedir=. --datadir=./data --socket=/tmp/mysql_ssmirnova.sock --port=33051 --log-error --open-files-limit=4096 & 31833 $
./bin/mysql -uroot -S /tmp/mysql_ssmirnova.sock -e "SELECT @@open_files_limit"+--------------------+ | @@open_files_limit | +--------------------+ | 1024 | +--------------------+
This option is very important if you have many tables, and a value that’s too small can lead to a slowdown because the MySQL server has to spend time opening and closing tables, or it may even reject connection attempts if the server can’t open new tables due to lack of resources.
I won’t describe how to tune OS limits in detail here, because this information is specific to each OS. I haven’t mentioned Windows, but rest assured it has limits too.
When you suspect that the OS is limiting your MySQL server, first check the resources described earlier: RAM, CPU, and network bandwidth. Check whether mysqld is using less than the hardware offers. Usually, performance problems occur when RAM or CPU are limited. Also check the number of open files allowed.
If you find that mysqld can and should use more resources, check various aspects of operating system tuning. For example, the OS’s kernel options or user defaults might set no limit on the amount of RAM available, but might specifically limit the user running mysqld. This is a common case when overall OS limits are huge or even not set, while defaults for user accounts are small.
If MySQL is using limited resources, you may sometimes wonder
whether it is in trouble because of lack of resources or just isn’t using
them because its load is low at the moment. When the server is in trouble, it
either prints messages to the error logfile or starts performing poorly.
Another telltale sign is when you try to increase an option, but it does
not increase. The earlier
open_files_limit example illustrated this
situation. In such cases, you will either find messages in the error
logfile if you set the option at startup or see a warning when you set the
option dynamically. It’s also a good idea to check the real value of the
Everything we’ve discussed so far is important when the MySQL server runs in any environment. Ideally, it should run in a dedicated environment and use all physical resources the machine has. But some sites use MySQL in shared environments. This includes shared hosting, where many instances of the mysqld server are running on behalf of different customers, and systems simultaneously running mysqld with client applications and other processes.
When you tune a MySQL installation in such configurations, you need to check two additional things: how many resources other processes use in their normal operations and how many resources they allocate at critical times.
Under normal loads, you can guess how many resources are left after other programs reserve theirs, and set MySQL’s options appropriately. Critical loads are usually surprising and can lead to sudden, mysterious MySQL errors or even crashes. There is no universal rule about what to do in these situations. Just remember that third-party software can affect a MySQL installation, analyze the potential load, and take action to compensate for the effects.
Thus, if you anticipate that some other application has a critical load at time X, measure its resource use under the load and adjust MySQL server options accordingly. In such environments, it makes sense to limit resources at the OS level or use virtualization. In contrast to previous sections, I am advising here to add restrictions rather than remove them.
The worst case occurs when an unusually heavy load cannot be predicted. You learn about it only when you hit a problem. So if MySQL starts failing when it should not, always remember the effects of concurrent processes, check OS logs, and see whether other applications can affect MySQL’s access to resources. It is also valuable to install software that monitors the activity of all processes running on the same box together with your MySQL installation.
A good method to ascertain whether the MySQL server was affected by another OS process is to run a problematic query in an isolated environment. This is the same method recommended in How Concurrency Affects Performance.
 Although the proper way to set operating system resource limits
is to use your specific platform tools, it is still worth mentioning
the built-in shell command
ulimit due to its ease
of use and availability for every user. It can either show all current
restrictions if run as
ulimit -a or set soft limits
for the current user.