The last few years have been tremendously exciting when it comes to third-party tools for MySQL. We have gone from a few little tools here and there to professionally maintained packages of tools, with many of the utilities in these packages solving issues that have been problematic for MySQL DBAs for a very long time. We will be covering a few of our favorite packages in this chapter. We will not be covering any commercial (for-pay) utilities. There are many great ones, of course, but we will limit this chapter to either open source or “free as in beer” utilities.
The Percona Toolkit actually used to be called Maatkit, before its adventures in rebranding. It was a must-have toolkit before the rename and has only gotten more so since. It contains over 30 tools, varying from hugely complicated and impactful utilities to glorified output reformatters. Some of the most important tools include:
- pt-query-digest
-
There once was a tool called mysqlsla, which basically everyone used to read the MySQL slow query log and output reports showing the top slow queries in the log. This was very useful, but mysqlsla didn’t really progress much. pt-query-digest is a replacement that allows for running
EXPLAIN
s automatically, monitoring process lists, and even generating ASCII graphs. Is it an overengineered tool for its purpose? Probably. Is it awesome enough that we don’t care about that? Absolutely. - pt-table-checksum
- We recommend this tool for every replication cluster that uses statement-based or mixed binary logging (It is somewhat less needed for, and more importantly not supported with, row-based replication). pt-table-checksum allows you to run a replicated checksum across your cluster to ensure that the two (or more) servers are actually consistent. MySQL itself doesn’t have checksums or validation built into the replication engine, and this tool fills a critical gap. (Note that some MySQL forks, such as MariaDB, are now integrating checksums into the binary log.)
- pt-online-schema-change
-
This tool is very dangerous to use, but quite cool to think about using. This script automates the somewhat common practice of
ALTER
ing a large table by creating a new table with a new table definition and then filling that table from the old table, while triggers are inserted to update the new table in real time while the fill is executed. At the end of all of this, you switch the two tables. It’s a complicated procedure that is fully automated by pt-online-schema-change. - pt-index-usage
- This tool can take a slow log (or general log), EXPLAIN the queries, and generate reports on index usage. It is a useful companion to pt-query-digest, more specific to index usage.
There are many more tools in this package, and we recommend taking a look at all of them. They generally make your life far easier when managing a DB server and examining query performance.
The openark kit gets less attention than Percona Toolkit, but it has some equally useful tools. While Percona Toolkit tends to be slightly more focused on performance optimization and replication, openark is more focused on operational tools. These are a few of the tools we have found most useful:
- oak-chunk-update
-
This tool takes a large
UPDATE
/DELETE
statement and automatically splits it into chunks to prevent it from locking a large segment of an often-used table. - oak-kill-slow-queries
- In a perfect world, this script wouldn’t need to exist. However, perfection is sadly lacking in most database clusters. For those times when perfection is escaping you, we have this script. It connects to your DB and kills queries that take longer than a configurable time period. This is useful when you have a site with a known “bad” query that hasn’t been fixed yet and could possibly harm your database server’s performance if left unchecked (for example, a query that only causes problem when run many times concurrently).
- oak-purge-master-logs
- This script can connect to your entire replication chain and then purge the master logs on a server (leaving a configurable number of logs). What is useful about this is that it will check every slave server and won’t purge a log that a slave still needs. Thus, it is a safe way to purge master logs.
- oak-security-audit
- This tool runs a quick audit of some standard security settings, accounts, and privileges.
- oak-show-limits
-
This script is really only useful for large installations, but it’s very useful for those—it shows the “space” you have left in
AUTO_INCREMENT
columns. This doesn’t sound critical, right up until it’s incredibly critical.
mysqlreport has been the standard in “general MySQL statistics” for as long as we can remember. It should be run on every MySQL server at least once. This utility will print out current usage information for the query cache, key buffer, and InnoDB buffer pool, as well as detailed InnoDB statistics and much more. If you only install one of these utilities, this should be the one. A guide for understanding its output is available here. We mentioned a few variables that are important to tune correctly for a Drupal/InnoDB MySQL instance in the previous chapter—this tool is how you would track those variables and validate your configuration. Let’s look at some example output, with comments describing each section.
The first section of mysqlreport’s output describes the MyISAM key buffer cache: for most deployments, you are mainly interested in whether it’s 100% used or not. If it is 100% used, it should likely be increased in size (note that even with an InnoDB server, the key buffer will be used for temporary and internal tables):
__ Key _________________________________________________________________ Buffer used 14.00k of 30.00M %Used: 0.05 Current 5.50M %Usage: 18.33 Write hit 0.00% Read hit 70.21%
The next section describes the breakdown by query type: in most cases, you will have far more SELECT
s than the volatile UPDATE
/INSERT
/DELETE
/REPLACE
options, but if that’s ever not true, this will allow you to discover that and track it. If you suddenly see a huge spike in volatile statements, you may need to investigate why that is (a cache table being refreshed repeatedly, a spam network registering new users, etc.) and take action:
__ Questions ___________________________________________________________ Total 1.81G 2.3k/s QC Hits 920.26M 1.2k/s %Total: 50.78 DMS 831.76M 1.0k/s 45.90 Com_ 39.21M 49.5/s 2.16 COM_QUIT 21.09M 26.6/s 1.16 -Unknown 91.11k 0.1/s 0.01 Slow 2 s 68.31k 0.1/s 0.00 %DMS: 0.01 Log: ON DMS 831.76M 1.0k/s 45.90 SELECT 809.20M 1.0k/s 44.65 97.29 INSERT 11.49M 14.5/s 0.63 1.38 UPDATE 9.07M 11.4/s 0.50 1.09 DELETE 1.24M 1.6/s 0.07 0.15 REPLACE 769.68k 1.0/s 0.04 0.09 Com_ 39.21M 49.5/s 2.16 set_option 29.66M 37.4/s 1.64 show_tables 9.49M 12.0/s 0.52 begin 29.75k 0.0/s 0.00
The following section gives a breakdown of the types of sorts, scans, and joins you’re performing. This section is usually a lot more useful when you have a huge amount of query control. Drupal deployments (especially multisite deployments), are not always a great fit for making this section useful. However, you should always monitor this for large changes and trends:
__ SELECT and Sort _____________________________________________________ Scan 15.80M 19.9/s %SELECT: 1.95 Range 234.83M 296.3/s 29.02 Full join 12.67k 0.0/s 0.00 Range check 0 0/s 0.00 Full rng join 201.66k 0.3/s 0.02 Sort scan 121.17M 152.9/s Sort range 26.33M 33.2/s Sort mrg pass 3.26M 4.1/s
The Query Cache section describes the query cache fill percentage and its usage: There are two important aspects to this part of the report. First, it allows you to monitor the hit to insert ratio and the prune ratio. The hit to insert ratio in particular is very important, as you’re really looking for a case where this ratio is quite hit-heavy. This represents a stable query cache that is very effective. However, if this ratio is close to 1:1, the cache is not being very effective and may not be worth the extra locking. Secondly, this section allows you to monitor how full the query cache is and whether there are excessive amounts of prunes. Assuming your cache is effective, it being full and being pruned often could mean it needs to be bigger:
__ Query Cache _________________________________________________________ Memory usage 26.57M of 50.00M %Used: 53.15 Block Fragmnt 16.39% Hits 920.26M 1.2k/s Inserts 807.94M 1.0k/s Insrt:Prune 1.24:1 196.1/s Hit:Insert 1.14:1
The next section is pretty simple: it just describes table locks. If you have a lot of waited table locks, you may have a MyISAM table that you need to convert to InnoDB:
__ Table Locks _________________________________________________________ Waited 0 0/s %Total: 0.00 Immediate 1.21G 1.5k/s
The Tables section describes the table cache. The major concern here is whether it is 100% full or not. In this example, the table cache should likely be increased as it is full:
__ Tables ______________________________________________________________ Open 1000 of 1000 %Cache: 100.00 Opened 1.55k 0.0/s
The next section allows you to monitor your peak connection count:
__ Connections _________________________________________________________ Max used 289 of 300 %Max: 96.33 Total 21.09M 26.6/s
The following section is largely useful for figuring out how many of your temporary tables are hitting disk and finding temporary table spikes. If you have a huge spike after a deployment, for example, there is likely an ill-behaved query in that code push and it should either be reverted or hot-fixed:
__ Created Temp ________________________________________________________ Disk table 4.59M 5.8/s Table 129.73M 163.7/s Size: 200.0M File 859.24k 1.1/s
The Threads section of the report shows what proportion of your threads are coming off the thread cache (ideally, most of them). As you can see in this example, this server isn’t doing well in this regard and should have the thread cache settings checked:
__ Threads _____________________________________________________________ Running 4 of 19 Cached 0 of 0 %Hit: 0 Created 21.09M 26.6/s Slow 0 0/s
The next section is often not a focus, but if you see large spikes in aborted clients and connections, it can indicate network issues:
__ Aborted _____________________________________________________________ Clients 50 0.0/s Connects 31.32k 0.0/s
The Bytes section gives a simple data transfer report:
__ Bytes _______________________________________________________________ Sent 1.65T 2.1M/s Received 240.05G 302.9k/s
The next section of the report covers the InnoDB buffer pool usage and fill rate; it is likely the most important section of the report for an InnoDB server. In this example, you can see that the buffer pool is 100% full and should be increased in size. However, it’s not the end of the world, as the read hit rate is still quite high, if not 100%. This part of the report also includes information on the pages in the buffer pool, the number of pages free, the number of data pages, etc., and it covers the IO InnoDB is performing (pages read and written and flushes to disk performed). In general, you are reading this section to ensure that your buffer pool has a 100% read hit rate and has free pages:
__ InnoDB Buffer Pool __________________________________________________ Usage 15.00G of 15.00G %Used: 100.00 Read hit 99.97% Pages Free 0 %Total: 0.00 Data 828.94k 84.32 %Drty: 3.37 Misc 154101 15.68 Latched 0.00 Reads 411.66G 519.5k/s From file 129.71M 163.7/s 0.03 Ahead Rnd 0 0/s Ahead Sql 0/s Writes 350.34M 442.1/s Flushes 18.50M 23.3/s Wait Free 222 0.0/s
The following section gives some introspection on InnoDB locking. The biggest things to watch for here are excessive waits (if it can actually compute a per-second value for this, it is a bad sign) and your average time for acquiring a lock. As you can see in this example, we actually have some slight locking issues on this server. At this point, we would want to review the slow log to look for queries with high lock times (note that only recent MySQL versions support InnoDB lock times in the slow log):
__ InnoDB Lock _________________________________________________________ Waits 16218 0.0/s Current 0 Time acquiring Total 4370595 ms Average 269 ms Max 19367 ms
Finally, the last section describes InnoDB data and IO usage in detail. Unless you are doing fairly advanced tuning and configuration related to IO throughput, this section is likely not a focus for you:
__ InnoDB Data, Pages, Rows ____________________________________________ Data Reads 130.65M 164.9/s Writes 15.47M 19.5/s fsync 2.09M 2.6/s Pending Reads 1 Writes 0 fsync 0 Pages Created 686.23k 0.9/s Read 140.26M 177.0/s Written 18.50M 23.3/s Rows Deleted 12.93M 16.3/s Inserted 14.48M 18.3/s Read 178.26G 224.9k/s Updated 8.96M 11.3/s
Percona Monitoring Plug-Ins is a set of extensions for Nagios/Icinga and Cacti that make them far more useful for MySQL administrators. They are a very useful addition to most Nagios or Cacti installations.
Get High Performance Drupal now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.