O'Reilly logo

MySQL Troubleshooting by Sveta Smirnova

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. MySQL’s Environment

The MySQL server is not alone in the environment where it runs. Even if it works in a dedicated environment, you still have to consider the hardware resources and operating system (OS) limits. In shared environments, the MySQL server is also affected by other processes. Tuning operating systems for MySQL is a huge topic about which a separate book could be written. Here I won’t go deep, but show some starting points from a troubleshooting perspective. MySQL runs in various environments, which is a great advantage for MySQL, but also something that makes it hard to get specific in this chapter. So I decided to show you what you need to care about, and leave it up to you to consult the manual for your OS to determine how to handle tuning.

Physical Hardware Limits

A common usage mistake is to have unrealistic expectations for performance. One can expect that the MySQL server needs to be tuned while forgetting the latency of hardware components. Therefore, it is important to understand what can cause the latency.

The following hardware resources affect the MySQL server:

  • RAM

  • CPU

  • Number of cores

  • Disk I/O

  • Network bandwidth

Let’s discuss each of them in a bit of detail.

RAM

Memory is a very important resource for MySQL. The server works fast when it does not swap. Ideally, it should fit in RAM. Therefore, it is important to configure buffers in such a way that they stay within the limits of physical memory. I provided guidelines for this in Effects of Options and in Chapter 3, particularly in Calculating Safe Values for Options.

Note

You can check whether mysqld is not swapping by checking vmstat on Linux/Unix or the Windows Task Manager on Windows.

Here is an example of swapping on Linux. Important parts are in bold. For a server that is not swapping, all these values should be equal to zero:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0   1936 296828   7524 5045340   4   11   860     0  470  440  4  2 75 18
 0  1   1936 295928   7532 5046432  36   40   860   768  471  455  3  3 75 19
 0  1   1936 294840   7532 5047564   4   12   868     0  466  441  3  3 75 19
 0  1   1936 293752   7532 5048664   0    0   848     0  461  434  5  2 75 18

In those sections, we discussed how configuration variables can affect memory usage. The basic rule is to calculate the maximum realistic amount of RAM that will be used by the MySQL server and to make sure you keep it less than the physical RAM you have. Having buffers larger than the actual memory size increases the risk of the MySQL server crashing with an “Out of memory” error.

  • The previous point can be stated the other way around: if you need larger buffers, buy more RAM. This is always a good practice for growing applications.

  • Use RAM modules that support extended error correction (EEC), so if a bit of memory is corrupted, the whole MySQL server does not crash.

A few other aspects of memory use that you should consider are listed in a chapter named “How MySQL Uses Memory” in the MySQL Reference Manual. I won’t repeat its contents here, because it doesn’t involve any new troubleshooting techniques.

One important point is that when you select a row containing a BLOB column, an internal buffer grows to the point where it can store this value, and the storage engine does not return the memory to RAM after the query finishes. You need to run FLUSH TABLE to free the memory.

Another point concerns differences between 32-bit and 64-bit architectures. Although the 32-bit ones use a smaller pointer size and thus can save memory, these systems also contain inherent restrictions on the size of buffers due to addressing limits in the operating system. Theoretically, the maximum memory available in a 32-bit system is 4GB per process, and it’s actually less on many systems. Therefore, if the buffers you want to use exceed the size of your 32-bit system, consider switching to a 64-bit architecture.

Processors and Their Cores

MySQL’s performance does not scale linearly with increasing CPU speed. This does not mean you can’t make use of a fast CPU, but don’t expect performance will scale by increasing CPU speed in the same way it can increase by adding more RAM.

However, the number of cores is important when you set options that affect internal thread concurrency. There is no sense in increasing the values of such options if you don’t have enough cores. This can be easily demonstrated using the benchmark utility named sysbench.[14] Table 4-1 shows the results of a small test on a machine with four cores. I used the OLTP sysbench test with 16 threads.

Table 4-1. Time spent executing an event with different innodb_thread_concurrency values
innodb_thread_concurrencyExecution time
17.8164
24.3959
42.5889
82.6708
163.4669
323.4235

As you can see, the test runs faster up until I start eight threads, and stops improving at higher values.

Disk I/O

Fast disks are very important for MySQL performance. The faster the disk, the faster the I/O operations.

Regarding disks, you should pay attention to disk read latency—how much time each read access takes—and fsync latency—how long each fsync takes.

Recent solid state disks (SSDs) work well, but don’t expect miracles from them yet, because most storage engines are optimized to do read and writes for hard disks.

The same issue applies to network storage. It is possible to store data and logfiles in network filesystems and storage, but these installations may be slower than local disks. You need to check is how fast and reliable your storage is. Otherwise, don’t be surprised if you experience data corruption because of network failure.

You can determine whether your disk I/O is overloaded using iostat on Linux/Unix. The average queue length of the requests issued to the device should not be high in normal use. On Windows you can use perfmon for the same purpose. Here is an example output of iostat:

$iostat -x 5
Linux 2.6.18-8.1.1.el5 (blade12)        11/11/2011      _x86_64_

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.27    0.00    0.32    0.65    0.00   97.79

Device:    rrqm/s wrqm/s  r/s   w/s   rsec/s  wsec/s avgrq-sz avgqu-sz await  
cciss/c0d0   0.02   7.25  0.50  6.34  14.35   108.73   17.98    0.48   69.58   
dm-0         0.00   0.00  0.52 13.59  14.27   108.70    8.72    0.09    6.59   

svctm  %util
2.22   1.52
1.08   1.52

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          38.69    0.00    6.43   47.84    0.00    8.64

Device:    rrqm/s  wrqm/s  r/s     w/s rsec/s   wsec/s avgrq-sz avgqu-sz  await 
cciss/c0d0   0.00 5362.40 0.20  713.80   1.60 51547.20    72.20   138.40 193.08  
dm-0         0.00    0.00 0.00 6086.00   0.00 48688.00     8.00  1294.74 227.04  

svctm %util
1.40 99.88
0.16 99.88

<skipped>

Device:    rrqm/s   wrqm/s  r/s     w/s  rsec/s   wsec/s avgrq-sz avgqu-sz  await 
cciss/c0d0   0.00 10781.80 0.00   570.20  0.00  84648.00   148.45   143.58 248.72  
dm-0         0.00     0.00 0.00 11358.00  0.00  90864.00     8.00  3153.82 267.57  

svctm  %util
1.75 100.02
0.09 100.02

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.90    0.00   11.30   75.10    0.00    5.60

Device:    rrqm/s   wrqm/s  r/s     w/s  rsec/s   wsec/s avgrq-sz avgqu-sz  await 
cciss/c0d0   0.00 11722.40 0.00  461.60    0.00 98736.00   213.90   127.78 277.04  
dm-0         0.00     0.00 0.00 12179.20   0.00 97433.60     8.00  3616.90 297.54  

svctm %util
2.14 98.80
0.08 98.80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          23.55    0.00   23.95   46.19    0.00    7.11

Device:    rrqm/s  wrqm/s  r/s     w/s  rsec/s   wsec/s avgrq-sz avgqu-sz  await 
cciss/c0d0   0.00 4836.80 1.00  713.60    8.00 49070.40    68.68   144.28 204.08  
dm-0         0.00    0.00 1.00 5554.60    8.00 44436.80     8.00  1321.82 257.28  

svctm  %util
1.40 100.02
0.18 100.02

This output was taken when mysqld was idle and then started an active I/O job. You can see how avgqu-sz is growing. Although this is far from problematic, I decided to put this example here to show how disk I/O activity changes while mysqld is doing its job.

And aside from speed, remember that the storage can lose data; partial page writes are still possible. If you use InnoDB, use the doublewrite buffer by setting innodb_doublewrite to secure your data. Also, is very important to plan battery backups for your disks because these can prevent data loss in case of power failure.

Network Bandwidth

Clients almost always connect to the MySQL server over a network, so it is important to run your MySQL server in a fast network.

In addition to network bandwidth, round-trip time (RTT) and the number of round-trips are important. RTT is a time needed for a client to send a network packet, then receive an answer from the server. The longer distance between the machines, the higher the RTT is.

Network bandwidth and RTT are the reasons why it is recommended to place the MySQL client and the server in the same local network when possible.

Local networks are recommended for replication as well. You can connect to slaves over the Internet instead of a local intranet, but expect delays and even errors due to corruption in the relay log data. Such errors should be fixed automatically after bug #26489 is fixed and if you use the relay-log-recovery option starting with version 5.5 and binary log checksums starting with version 5.6. But the master and slave will still spend time resending packets due to network failures.

Example of the Effect of Latencies

To finish this part of the chapter, I will show you small example of how hardware latencies affect a trivial UPDATE query. We will use an InnoDB table with autocommit turned on. We will also turn on the binary logfile.

UPDATE test_rrbs SET f1 = md5(id*2) WHERE id BETWEEN 200000 AND 300000;

This simple query could experience latencies when:

The client sends a command to the server that takes a half of RTT.
The WHERE clause of the UPDATE is executed, and mysqld reads the disk.
mysqld does an fsync call to prepare for the transaction because autocommit is on.
mysqld does an fsync call to write into a binary logfile.
mysqld does an fsync call to commit changes.
The client receives the result from the server, which is another aspect of RTT.

Operating System Limits

Other limits on MySQL are set by operating systems. For example, I saw a case when a server failed because the Linux host specified vm.overcommit_ratio = 50. This option specifies the percentage of the system’s total virtual memory that can be allocated via malloc(); attempts to allocate more memory by a process will fail. Fifty percent is the default value for many Linux installations. So when mysqld allocates about 50% of existing RAM and then tries to allocate more, it fails. In a multitasking setup, such an option may be useful because it protects other critical processes from a greedy MySQL server, but it’s absolutely ridiculous in a dedicated server.

Another important Unix option is ulimit, which restricts various resources for users.[15] When you set resources using the ulimit command or another system utility, remember that the MySQL server runs as a simple user and is subject to the same limits as everyone else.

Server-related options mentioned how OS restrictions affect the open_files_limit variable, and I’ll show an example here. Suppose the server’s ulimit on open files (the -n option) is set to 1024, the default on many systems. If you try to start mysqld with --open-files-limit=4096, it would not override the operating system limit.

$ulimit -n
1024

$./bin/mysqld --defaults-file=support-files/my-small.cnf --basedir=.
--datadir=./data --socket=/tmp/mysql_ssmirnova.sock --port=33051 --log-error
--open-files-limit=4096 &
[1] 31833

$./bin/mysql -uroot -S /tmp/mysql_ssmirnova.sock -e "SELECT
@@open_files_limit"
+--------------------+
| @@open_files_limit |
+--------------------+
|               1024 |
+--------------------+

This option is very important if you have many tables, and a value that’s too small can lead to a slowdown because the MySQL server has to spend time opening and closing tables, or it may even reject connection attempts if the server can’t open new tables due to lack of resources.

I won’t describe how to tune OS limits in detail here, because this information is specific to each OS. I haven’t mentioned Windows, but rest assured it has limits too.

When you suspect that the OS is limiting your MySQL server, first check the resources described earlier: RAM, CPU, and network bandwidth. Check whether mysqld is using less than the hardware offers. Usually, performance problems occur when RAM or CPU are limited. Also check the number of open files allowed.

If you find that mysqld can and should use more resources, check various aspects of operating system tuning. For example, the OS’s kernel options or user defaults might set no limit on the amount of RAM available, but might specifically limit the user running mysqld. This is a common case when overall OS limits are huge or even not set, while defaults for user accounts are small.

If MySQL is using limited resources, you may sometimes wonder whether it is in trouble because of lack of resources or just isn’t using them because its load is low at the moment. When the server is in trouble, it either prints messages to the error logfile or starts performing poorly. Another telltale sign is when you try to increase an option, but it does not increase. The earlier open_files_limit example illustrated this situation. In such cases, you will either find messages in the error logfile if you set the option at startup or see a warning when you set the option dynamically. It’s also a good idea to check the real value of the suspected variable.

Effects of Other Software

Everything we’ve discussed so far is important when the MySQL server runs in any environment. Ideally, it should run in a dedicated environment and use all physical resources the machine has. But some sites use MySQL in shared environments. This includes shared hosting, where many instances of the mysqld server are running on behalf of different customers, and systems simultaneously running mysqld with client applications and other processes.

When you tune a MySQL installation in such configurations, you need to check two additional things: how many resources other processes use in their normal operations and how many resources they allocate at critical times.

Under normal loads, you can guess how many resources are left after other programs reserve theirs, and set MySQL’s options appropriately. Critical loads are usually surprising and can lead to sudden, mysterious MySQL errors or even crashes. There is no universal rule about what to do in these situations. Just remember that third-party software can affect a MySQL installation, analyze the potential load, and take action to compensate for the effects.

Thus, if you anticipate that some other application has a critical load at time X, measure its resource use under the load and adjust MySQL server options accordingly. In such environments, it makes sense to limit resources at the OS level or use virtualization. In contrast to previous sections, I am advising here to add restrictions rather than remove them.

The worst case occurs when an unusually heavy load cannot be predicted. You learn about it only when you hit a problem. So if MySQL starts failing when it should not, always remember the effects of concurrent processes, check OS logs, and see whether other applications can affect MySQL’s access to resources. It is also valuable to install software that monitors the activity of all processes running on the same box together with your MySQL installation.

  • A good method to ascertain whether the MySQL server was affected by another OS process is to run a problematic query in an isolated environment. This is the same method recommended in How Concurrency Affects Performance.



[14] I discuss sysbench in SysBench.

[15] Although the proper way to set operating system resource limits is to use your specific platform tools, it is still worth mentioning the built-in shell command ulimit due to its ease of use and availability for every user. It can either show all current restrictions if run as ulimit -a or set soft limits for the current user.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required