Chapter 13. MySQL

One could easily write an entire book on MySQL configuration, monitoring, and query optimization. Actually, many people have, and getting one of these books would be an excellent idea. We will cover some facets of configuration and optimization here (those that directly relate to Drupal). This is a very large subject, and one any Drupal developer should be well versed in. Any content management system is in essence a very advanced database frontend, and not knowing the underlying technology of your site and/or business is asking for trouble.

Drupal and MySQL Engines

MySQL includes support for pluggable storage backends, putting data storage and retrieval entirely in their hands. There is even a CSV storage engine that literally stores tables as comma-separated values (CSV) files. While pluggable engines is an interesting feature and makes it possible for MySQL to be surprisingly flexible, in practice, InnoDB and MyISAM are the only engines you are likely to use.

MyISAM is an older-style engine with very limited data durability, simplistic caching, and very few advanced features of any kind. It excels in low-memory situations and on some very poorly optimized queries. These days it is considered dangerous to actually use MyISAM for production data, due to it not being crash safe and it lacking row-level locking. However, it is important to keep in mind that MyISAM is used for temporary tables written to disk. So, while it shouldn’t be used for actually storing data, it is still going to be used during query execution.

InnoDB is the flagship MySQL storage engine. It has been around for a long time, but has recently seen a lot of advancement. It is very stable, and is now scaling across multicore CPUs considerably better than it has in the past. Some versions of InnoDB even support tuning parameters to make it use the full IO capacity of solid-state drives. As of Drupal 7, InnoDB is the default for new installations. If there is one thing to take away from this chapter, it is this: use InnoDB.

There are other engines that could be worth looking at that specialize in integrating other technologies or in data archiving. If you have a very specialized task, some of these can be useful; however, they are beyond the scope of this book and most Drupal deployments.

Versions of MySQL

It has been an exciting few years for MySQL. While you could argue that development had been stagnating, the purchase of MySQL AB and the immediate forks thereafter have really accelerated development across the board. While many were concerned about the MySQL ecosystem around the time of these forks, they have turned out to be very positive for users. The one downside is that it’s now quite difficult to figure out which version of MySQL to use. To be clear, we will not be recommending one here. However, we will be covering the differences and pointing out that in most cases, whatever version “fits” best in your infrastructure (i.e., is easiest to install/update) is likely going to be fine for you.

Oracle MySQL

Oracle’s MySQL is what everyone thinks of as “MySQL.” There was definitely a period where this fork was significantly lagging in scalability features, but that time is long past. If you have a recent version of Oracle’s MySQL and don’t need a specialized feature in another distribution, you are likely to be fine with this version.

MariaDB

MariaDB is a fork of MySQL by one of the project’s founder, Monty Widenus. The original goal of this project was to create a better version of MyISAM, but the project quickly grew into a fork of the entire server as more developers joined the project. MariaDB has gotten a lot of traction recently, including being integrated into Fedora and RHEL7 instead of Oracle’s MySQL. Some of the highlights that you may notice are:

Aria storage engine
This is the result of the original goal of the project, creating a crash-safe version of MyISAM. While it’s still not something you should use for your main database, it definitely is more generally usable and has some decent advantages for temporary tables (e.g., better buffering of data).
XtraDB
XtraDB is a fork of InnoDB that includes some advanced instrumentation, hooks for online binary backups, and many scalability changes. This version of InnoDB is the default in MariaDB. You can optionally enable the mainline InnoDB, but most people using MariaDB enjoy the “Xtra” features of XtraDB.
“Enterprise” features
There are some features, such as threading changes, that Oracle’s MySQL doesn’t include in the community (free) version. MariaDB ships with these in all versions.

If you look at the feature comparison charts available online, you can see many other differences; however, concurrent development happens fast enough that many of these lists quickly become outdated or are somewhat debatable. We are fans of MariaDB in general, but it’s fairly rare that you see a Drupal site that really needs any of these features.

Percona Server

Percona Server is a fork of MySQL that is developed by one of the leading MySQL performance firms. In general, if you need one of the specific features in this version, you are going to be well aware of that fact. It is somewhat of a niche MySQL distribution for those with extensive scalability requirements. However, many of the features developed by Percona in this MySQL distribution make it into other versions—XtraDB is an example of this.

General Configuration

There are many books on MySQL configuration and many sources for template configuration files. Due to this, we are not going to cover every configuration option, but instead will focus on those that are most important for Drupal.

MySQL configuration can generally be split into global configuration, per-thread configuration, and storage engine configuration. Let’s take a look at some of the important settings for each.

Global Configuration

Drupal doesn’t require tuning of many nonstandard global options. Those you may want to consider include:

max_connections
This configuration option is pretty self-explanatory, but it’s often set far too high. This is because when max_connection errors are seen on the frontend, the problem is often assumed to be that max_connections is set too low. Usually, however, the problem actually is that there are a massive number of slow queries running. In many cases, increasing max_connections only allows a server to thrash itself to death more effectively. This limit should be seen as a safety maximum and should be adjusted based on the amount of memory available. You can calculate memory usage with a formula like the following to get something that is “close enough to accurate”:
(global buffers) + ((per-thread buffers + tmp_table_size) * max_connections)
key_buffer
This is a very important buffer for MyISAM, which you should keep in mind is still being used for temporary tables. Any sane default is likely fine (30–50 MB for a dedicated server is not a bad starting point), but if you use a statistics script such as mysqlreport and the buffer is constantly full, make sure you increase it.
query_cache_size
It is remarkably easy to talk about the MySQL query cache for a long, long time. It is both a really effective cache that can increase performance dramatically and a terrible global lock that can destroy performance completely. For most Drupal sites, it is definitely worth enabling, but it’s best to not set this above 100 MB, and we tend to set it at 60 MB as a default.

Per-Thread Configuration

Per-thread options are generally related to either temporary tables or the various buffers used in the execution of a single query. Many of the quality configuration templates, such as those from the IUS RHEL/CentOS repositories or Percona Tools for MySQL have good defaults for these values. You can spend a lot of time tweaking and testing the various buffers to very slightly increase performance for your specific query load. However, the benefit is often fairly negligible. In almost every case, the defaults are fine, and that time is better spent on query optimization. The only per-thread settings we often recommend tweaking for a Drupal website are those affecting temporary tables:

tmp_table_size/max_heap_table_size
These two variables control the maximum size of in-memory temporary tables. When a temp table exceeds this limit (or if a table contains TEXT/BLOB columns), it is copied to disk. In general you want to avoid this, while not increasing these settings to a size that is not sustainable for your server. Finding what is sustainable is the issue. In general, you could just take your maximum temporary table size and multiply it by max_connections to ensure you have enough memory to support every connection with a temporary table. However, this isn’t a perfect method, as many people do intentionally run with temporary table sizes that are “too large” for their server. These people tend to have historical knowledge of their database server’s usage to allow them to feel comfortable running “overcommitted” from a memory perspective.

Note

These variables should be set to the same value, and both need to be set or the smallest value will be used.

Storage Engine Configuration

Unlike most of the per-thread configuration, correctly configuring InnoDB is extremely important. In particular, the InnoDB buffer size and transaction flush method are possibly the most important tunables for a Drupal installation. MyISAM has few important tunables for a Drupal installation other than the generic key_buffer, so we will focus on InnoDB here:

innodb_buffer_size
This configuration option defines the size of the global InnoDB buffer. It’s difficult to overstate how important this buffer is. The InnoDB buffer holds data cached here for reading, the insert buffer, as well as any hash indexes the engine builds for optimization. Not only do you want a large enough buffer to hold your entire dataset, but you want it large enough to have free space for the insert buffer and hash indexes. The usual recommendation is to set this to two-thirds of your memory space, but we tend to start at around half and then monitor it from there. Drupal sites specifically tend to have some fairly high tmp_table requirements, and that can eat into your available memory.
innodb_flush_log_at_trx_commit
This option defines exactly how careful InnoDB is with your data. Possible settings are:
 0
Write out and flush the log to disk every second, no matter when a COMMIT is triggered.
 1
Write out and flush the log to disk at every COMMIT.
 2
Write out the log at every COMMIT, but only flush every second.

InnoDB defaults to 1, which is fully durable and in theory will lead to no data loss even on full power loss (this is very dependent on the backend IO device). This setting is exceptionally slow for Drupal, though, as it COMMITs quite a lot. Most people end up setting this to either 0 or 2, as the risk is the possibility of losing up to one second of data. In most infrastructures, not only is that not a huge issue, but there are more likely ways to lose one second of data.

Note

While losing one second of data isn’t that scary, what is sometimes an issue is the InnoDB transactional state getting out of sync with the binary log. There are deployments where this is a significant problem, either because it’s difficult to resync the slave or because the slaves are very important to the correct operation of the site. In cases like this, you will have to run InnoDB with more data protection and likely should ensure that you have a new enough version of MySQL with group commit (this combines COMMIT operations, reducing disk flushes).

innodb_log_file_size/innodb_log_files
These settings can be important if you have a very frequently updated site. If you have a lot of updates, it’s quite possible to fill up your log file space, which will force InnoDB to process its log files immediately and can cause both IO spikes and queries stacking up. The default log file size for InnoDB is actually quite low. Increasing this to something like 256 MB is usually not a bad idea, but be aware it will increase the time it takes InnoDB to recover from a crash (more logs to traverse).

Warning

The InnoDB log file size is one of the few settings that not only requires a restart to load in, but also requires some work to move the old log files away. When you change the log file size, InnoDB will see the old logs and then throw an error because they are the wrong size. It will simply refuse to start. Because of this, you must fully shut down MySQL and move the ib_logfile files out of the way before starting MySQL with the new setting. It is extremely important that InnoDB shuts down fully and purges out the log files before you move them out of the way. Because of this, you must ensure that innodb_fast_shutdown is not set to 2. Having it set to 1 is fine, but 2 will not perform a clean shutdown.

Replication

Most every production deployment will need a replicated slave, at the very least for failover during crashes and/or updates. Again, there are many books and tutorials on setting up replication, and thus we are not going to cover this topic in detail. However, there are some Drupal-specific considerations:

Replication type
There are three types of replication: statement, row, and mixed. Statement replication simply sends the SQL statements themselves to the slave, which then runs them. It is the oldest style of replication in MySQL. Row-level replication replicates the actual binary delta and applies it to the tables on the slave. Mixed replication is, not shockingly, a mix of these two modes. Where Drupal comes into this is for contrib modules, which can have some poorly written SQL queries that don’t always comply with the requirements for statement-level replication. When a query like this is run on a recent version of MySQL that has statement-level logging enabled, a warning will be put in the error log. You should watch for these warnings and consider row-level or mixed replication if it becomes a problem for you.

Note

There are quite a few exceptions to the rules for which queries can be perfectly replicated with statement replication and which cannot. In general, the issues center around repeatability. For example, UPDATE statements with a LIMIT clause tend to be the biggest offenders for statement replication.

Slave lag
MySQL replication is currently single threaded. This can be a problem, because Drupal has historically had a habit of running some longish DELETE queries to maintain its various logging and metrics tables. Contrib modules are also guilty of this. Most of these issues have been fixed in core and many of the popular contrib modules, but if you ever use your MySQL slave for read queries, this is something you need to actively monitor. Your slave server falling several minutes behind during cron runs can lead to some very difficult-to-debug issues if you are not monitoring slave lag actively.
Read-only queries
Pressflow 6, Drupal 7, and Drupal 8 have the ability to register a slave server and mark certain read-only queries to be sent to the slave. Pager queries (queries that show listings of information across several pages), for example, are often slave-safe.

Note

There are modules that automate the splitting of queries between the master and the slave, but they are not 100% accurate and we do not recommend them. It is usually safer to manage this yourself via the slave options to Drupal Views or the _slave versions of Drupal’s db_query routines.

Virtualized Deployments

Configuring MySQL for virtualized deployments usually involves making absolutely sure your dataset fits in RAM and then configuring InnoDB to be a bit less durable (using the flush_log_at_trx_commit configuration option described earlier). This is due to most cloud systems having really below average IO throughput, which is something that is very important to most database systems and thus is an issue that must be mitigated.

Some other options for increasing IO throughput are:

  • Configure a shared memory (SHM) mount/ramdisk and point the tmpdir configuration directive at it. This will write all temp tables to the memory mount and remove that IO from the disk backend.

Note

This reduces the data durability of temp tables, and if your memory mount fills up, queries will start failing.

  • Use striped volumes (RAID-0). This is mainly useful for Amazon EC2, and some templated server configuration packages, such as RightScale, even offer an automated way to stripe EBS volumes for use as MySQL volumes. These volume stripes tend to increase performance somewhat, but are more useful in evening out performance. EBS volumes tend to have very volatile IO throughput, and striping them together helps to alleviate this volatility.

Get High Performance Drupal now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.