One could easily write an entire book on MySQL configuration, monitoring, and query optimization. Actually, many people have, and getting one of these books would be an excellent idea. We will cover some facets of configuration and optimization here (those that directly relate to Drupal). This is a very large subject, and one any Drupal developer should be well versed in. Any content management system is in essence a very advanced database frontend, and not knowing the underlying technology of your site and/or business is asking for trouble.
MySQL includes support for pluggable storage backends, putting data storage and retrieval entirely in their hands. There is even a CSV storage engine that literally stores tables as comma-separated values (CSV) files. While pluggable engines is an interesting feature and makes it possible for MySQL to be surprisingly flexible, in practice, InnoDB and MyISAM are the only engines you are likely to use.
MyISAM is an older-style engine with very limited data durability, simplistic caching, and very few advanced features of any kind. It excels in low-memory situations and on some very poorly optimized queries. These days it is considered dangerous to actually use MyISAM for production data, due to it not being crash safe and it lacking row-level locking. However, it is important to keep in mind that MyISAM is used for temporary tables written to disk. So, while it shouldn’t be used for actually storing data, it is still going to be used during query execution.
InnoDB is the flagship MySQL storage engine. It has been around for a long time, but has recently seen a lot of advancement. It is very stable, and is now scaling across multicore CPUs considerably better than it has in the past. Some versions of InnoDB even support tuning parameters to make it use the full IO capacity of solid-state drives. As of Drupal 7, InnoDB is the default for new installations. If there is one thing to take away from this chapter, it is this: use InnoDB.
There are other engines that could be worth looking at that specialize in integrating other technologies or in data archiving. If you have a very specialized task, some of these can be useful; however, they are beyond the scope of this book and most Drupal deployments.
It has been an exciting few years for MySQL. While you could argue that development had been stagnating, the purchase of MySQL AB and the immediate forks thereafter have really accelerated development across the board. While many were concerned about the MySQL ecosystem around the time of these forks, they have turned out to be very positive for users. The one downside is that it’s now quite difficult to figure out which version of MySQL to use. To be clear, we will not be recommending one here. However, we will be covering the differences and pointing out that in most cases, whatever version “fits” best in your infrastructure (i.e., is easiest to install/update) is likely going to be fine for you.
Oracle’s MySQL is what everyone thinks of as “MySQL.” There was definitely a period where this fork was significantly lagging in scalability features, but that time is long past. If you have a recent version of Oracle’s MySQL and don’t need a specialized feature in another distribution, you are likely to be fine with this version.
MariaDB is a fork of MySQL by one of the project’s founder, Monty Widenus. The original goal of this project was to create a better version of MyISAM, but the project quickly grew into a fork of the entire server as more developers joined the project. MariaDB has gotten a lot of traction recently, including being integrated into Fedora and RHEL7 instead of Oracle’s MySQL. Some of the highlights that you may notice are:
If you look at the feature comparison charts available online, you can see many other differences; however, concurrent development happens fast enough that many of these lists quickly become outdated or are somewhat debatable. We are fans of MariaDB in general, but it’s fairly rare that you see a Drupal site that really needs any of these features.
Percona Server is a fork of MySQL that is developed by one of the leading MySQL performance firms. In general, if you need one of the specific features in this version, you are going to be well aware of that fact. It is somewhat of a niche MySQL distribution for those with extensive scalability requirements. However, many of the features developed by Percona in this MySQL distribution make it into other versions—XtraDB is an example of this.
There are many books on MySQL configuration and many sources for template configuration files. Due to this, we are not going to cover every configuration option, but instead will focus on those that are most important for Drupal.
MySQL configuration can generally be split into global configuration, per-thread configuration, and storage engine configuration. Let’s take a look at some of the important settings for each.
max_connectionerrors are seen on the frontend, the problem is often assumed to be that
max_connections is set too low. Usually, however, the problem actually is that there are a massive number of slow queries running. In many cases, increasing
max_connections only allows a server to thrash itself to death more effectively. This limit should be seen as a safety maximum and should be adjusted based on the amount of memory available. You can calculate memory usage with a formula like the following to get something that is “close enough to accurate”:
(global buffers) + ((per-thread buffers + tmp_table_size) * max_connections)
Per-thread options are generally related to either temporary tables or the various buffers used in the execution of a single query. Many of the quality configuration templates, such as those from the IUS RHEL/CentOS repositories or Percona Tools for MySQL have good defaults for these values. You can spend a lot of time tweaking and testing the various buffers to very slightly increase performance for your specific query load. However, the benefit is often fairly negligible. In almost every case, the defaults are fine, and that time is better spent on query optimization. The only per-thread settings we often recommend tweaking for a Drupal website are those affecting temporary tables:
max_connectionsto ensure you have enough memory to support every connection with a temporary table. However, this isn’t a perfect method, as many people do intentionally run with temporary table sizes that are “too large” for their server. These people tend to have historical knowledge of their database server’s usage to allow them to feel comfortable running “overcommitted” from a memory perspective.
These variables should be set to the same value, and both need to be set or the smallest value will be used.
Unlike most of the per-thread configuration, correctly configuring InnoDB is extremely important. In particular, the InnoDB buffer size and transaction flush method are possibly the most important tunables for a Drupal installation. MyISAM has few important tunables for a Drupal installation other than the generic
key_buffer, so we will focus on InnoDB here:
tmp_tablerequirements, and that can eat into your available memory.
COMMIT, but only flush every second.
InnoDB defaults to
1, which is fully durable and in theory will lead to no data loss even on full power loss (this is very dependent on the backend IO device). This setting is exceptionally slow for Drupal, though, as it
COMMITs quite a lot. Most people end up setting this to either
2, as the risk is the possibility of losing up to one second of data. In most infrastructures, not only is that not a huge issue, but there are more likely ways to lose one second of data.
While losing one second of data isn’t that scary, what is sometimes an issue is the InnoDB transactional state getting out of sync with the binary log. There are deployments where this is a significant problem, either because it’s difficult to resync the slave or because the slaves are very important to the correct operation of the site. In cases like this, you will have to run InnoDB with more data protection and likely should ensure that you have a new enough version of MySQL with group commit (this combines
COMMIT operations, reducing disk flushes).
The InnoDB log file size is one of the few settings that not only requires a restart to load in, but also requires some work to move the old log files away. When you change the log file size, InnoDB will see the old logs and then throw an error because they are the wrong size. It will simply refuse to start. Because of this, you must fully shut down MySQL and move the ib_logfile files out of the way before starting MySQL with the new setting. It is extremely important that InnoDB shuts down fully and purges out the log files before you move them out of the way. Because of this, you must ensure that
innodb_fast_shutdown is not set to
2. Having it set to
1 is fine, but
2 will not perform a clean shutdown.
Most every production deployment will need a replicated slave, at the very least for failover during crashes and/or updates. Again, there are many books and tutorials on setting up replication, and thus we are not going to cover this topic in detail. However, there are some Drupal-specific considerations:
There are quite a few exceptions to the rules for which queries can be perfectly replicated with statement replication and which cannot. In general, the issues center around repeatability. For example,
UPDATE statements with a
LIMIT clause tend to be the biggest offenders for statement replication.
DELETEqueries to maintain its various logging and metrics tables. Contrib modules are also guilty of this. Most of these issues have been fixed in core and many of the popular contrib modules, but if you ever use your MySQL slave for read queries, this is something you need to actively monitor. Your slave server falling several minutes behind during cron runs can lead to some very difficult-to-debug issues if you are not monitoring slave lag actively.
There are modules that automate the splitting of queries between the master and the slave, but they are not 100% accurate and we do not recommend them. It is usually safer to manage this yourself via the slave options to Drupal Views or the
_slave versions of Drupal’s
Configuring MySQL for virtualized deployments usually involves making absolutely sure your dataset fits in RAM and then configuring InnoDB to be a bit less durable (using the
flush_log_at_trx_commit configuration option described earlier). This is due to most cloud systems having really below average IO throughput, which is something that is very important to most database systems and thus is an issue that must be mitigated.
Some other options for increasing IO throughput are:
This reduces the data durability of temp tables, and if your memory mount fills up, queries will start failing.