Chapter 5. MySQL Optimisation

Optimisation is a complicated task because it ultimately requires understanding of the whole system. While it may be possible to do some local optimisations with little knowledge of your system or application, the more optimal you want your system to become the more you will have to know about it.

This chapter will try to explain and give some examples of different ways to optimise MySQL. Remember, however, that there are always some (increasingly harder) additional ways to make the system even faster.

Optimisation Overview

The most important part in speeding up a system is, of course, the basic design. You also need to know what your system will be doing, and what your bottlenecks are.

The most common bottlenecks are:

  • Disk seeks. It takes time for the disk to find a piece of data. With modern disks in 1999, the mean time for this is usually lower than 10ms, so we can in theory do about 100 seeks a second. This time improves slowly with new disks and is very hard to optimise for a single table. The way to optimise this is to spread the data on more than one disk.

  • Disk reading/writing. When the disk is at the correct position we need to read the data. With modern disks in 1999, one disk delivers something like 10-20M/s. This is easier to optimise than seeks because you can read in parallel from multiple disks.

  • CPU cycles. When we have the data in main memory (or if it already was there) we need to process it to get to our result. Having small tables compared to the memory is the most common limiting factor. But then, with small tables speed is usually not the problem.

  • Memory bandwidth. When the CPU needs more data than can fit in the CPU cache the main memory bandwidth becomes a bottleneck. This is an uncommon bottleneck for most systems, but one should be aware of it.

MySQL Design Limitations/Tradeoffs

When using the MyISAM table handler, MySQL uses extremely fast table locking (multiple readers/single writers). The biggest problem with this table type is if you have a mix of a steady stream of updates and slow selects on the same table. If this is a problem with some tables, you can use another table type for these. See Chapter 7.

MySQL can work with both transactional and non-transactional tables. To be able to work smoothly with non-transactional tables (which can’t roll back if something goes wrong), MySQL has the following rules:

  • All columns have default values.

  • If you insert a ‘wrong’ value in a column—for instance, a NULL in a NOT NULL column or a numerical value that’s too big in a numerical column—MySQL will, instead of giving an error, set the column to the ‘best possible value’. For numerical values this is 0, the smallest possible values, or the largest possible value. For strings this is either the empty string or the longest possible string that can be in the column.

  • All calculated expressions return a value that can be used instead of signaling an error condition. For example. 1/0 returns NULL.

The reason for these rules is that we can’t check these conditions before the query starts to execute. If we encounter a problem after updating a few rows, we can’t just rollback, as the table type may not support this. We can’t stop because in that case the update would be ‘half done’ which is probably the worst possible scenario. In this case it’s better to ‘do the best you can’ and then continue as if nothing happened.

This means that one should not use MySQL to check field content, but one should do this in the application.

Portability

Because all SQL servers implement different parts of SQL, it takes work to write portable SQL applications. For very simple selects/inserts it is very easy, but the more you need the harder it gets. If you want an application that is fast with many databases, it becomes even harder!

To make a complex application portable you need to choose a number of SQL servers that it should work with.

You can use the MySQL crash-me program/web page (http://www.mysql.com/information/crash-me.php) to find functions, types, and limits you can use with a selection of database servers. Crash-me now tests far from everything possible, but it is still comprehensive with about 450 things tested.

For example, you shouldn’t have column names longer than 18 characters if you want to be able to use Informix or DB2.

Both the MySQL benchmarks and crash-me programs are very database-independent. By taking a look at how we have handled this, you can get a feeling for what you have to do to write your application so that it is database-independent. The benchmarks themselves can be found in the sql-bench directory in the MySQL source distribution. They are written in Perl with a DBI database interface (which solves the access part of the problem).

See http://www.mysql.com/information/benchmarks.html for the results from this benchmark.

As you can see in these results, all databases have some weak points. That is, they have different design compromises that lead to different behavior.

If you strive for database independence, you need to get a good feeling for each SQL server’s bottlenecks. MySQL is very fast at retrieving and updating things, but will have a problem mixing slow readers/writers on the same table. Oracle, on the other hand, has a big problem when you try to access rows that you have recently updated (until they are flushed to disk). Transaction databases in general are not very good at generating summary tables from log tables, as in this case row locking is almost useless.

To get your application really database-independent, you need to define an easily extendable interface through which you manipulate your data. As C++ is available on most systems, it makes sense to use a C++ classes interface to the databases.

If you use some specific feature for some database (like the REPLACE command in MySQL), you should code a method for the other SQL servers to implement the same feature (but slower). With MySQL you can use the /*! */ syntax to add MySQL-specific keywords to a query. The code inside /**/ will be treated as a comment (ignored) by most other SQL servers.

If high performance is more important than exactness, as in some Web applications, it is possibile to create an application layer that caches all results to give you even higher performance. By letting old results ‘expire’ after a while, you can keep the cache reasonably fresh. This provides a method to handle high-load spikes, in which case you can dynamically increase the cache and set the expire timeout higher until things get back to normal.

In this case the table creation information should contain information on the initial size of the cache and how often the table should normally be refreshed.

What Have We Used MySQL for?

During MySQL initial development, the features of MySQL were made to fit our largest customer. They handle data warehousing for a couple of the biggest retailers in Sweden.

From all stores, we get weekly summaries of all bonus-card transactions, and we are expected to provide useful information for the store owners to help them find how their advertisement campaigns are affecting their customers.

The data is quite huge (about 7 million summary transactions per month), and we have data for 4-10 years that we need to present to the users. We got weekly requests from the customers that they want to get ‘instant’ access to new reports from this data.

We solved this by storing all information per month in compressed ‘transaction’ tables. We have a set of simple macros (script) that generate summary tables grouped by different criteria (product group, customer id, store ...) from the transaction tables. The reports are web pages that are dynamically generated by a small Perl script that parses a web page, executes the SQL statements in it, and inserts the results. We would have used PHP or mod_perl instead, but they were not available at that time.

For graphical data we wrote a simple tool in C that can produce GIFs based on the result of a SQL query (with some processing of the result). This is also dynamically executed from the Perl script that parses the HTML files.

In most cases a new report can simply be done by copying an existing script and modifying the SQL query in it. In some cases, we will need to add more fields to an existing summary table or generate a new one, but this is also quite simple, as we keep all transaction tables on disk. (Currently we have at least 50G of transaction tables and 200G of other customer data.)

We also let our customers access the summary tables directly with ODBC so that the advanced users can themselves experiment with the data.

We haven’t had any problems handling this with quite modest Sun Ultra SPARCstation (2x200 Mhz). We recently upgraded one of our servers to a 2-CPU 400Mhz UltraSPARC, and we are now planning to start handling transactions on the product level, which would mean a tenfold increase of data. We think we can keep up with this by just adding more disks to our systems.

We are also experimenting with Intel-Linux to be able to get more CPU power cheaper. Now that we have the binary portable database format (new in Version 3.23), we will start to use this for some parts of the application.

Our initial feelings are that Linux will perform much better on low to medium loads and Solaris will perform better when you start to get a high load because of extreme disk I/O, but we don’t yet have anything conclusive about this. After some discussion with a Linux Kernel developer, this might be a side effect of Linux giving so much resources to the batch job that the interactive performance gets very low. This makes the machine feel very slow and unresponsive while big batches are going. Hopefully this will be better handled in future Linux Kernels.

The MySQL Benchmark Suite

This should contain a technical description of the MySQL benchmark suite (and crash-me), but that description is not written yet. Currently, you can get a good idea of the benchmark by looking at the code and results in the sql-bench directory in any MySQL source distributions.

This benchmark suite is meant to be a benchmark that will tell any user what things a given SQL implementation performs well or poorly.

Note that this benchmark is single-threaded, so it measures the minimum time for the operations. We plan to in the future add a lot of multi-threaded tests to the benchmark suite.

For example, (run on the same NT 4.0 machine):

Reading 2000000 rows by index

Seconds

Seconds

mysql

367

249

mysql_odbc

464

db2_odbc

1206

informix_odbc

121126

ms-sql_odbc

1634

oracle_odbc

20800

solid_odbc

877

sybase_odbc

17614

Inserting (350768) rows

Seconds

Seconds

mysql

381

206

mysql_odbc

619

db2_odbc

3460

informix_odbc

2692

ms-sql_odbc

4012

oracle_odbc

11291

solid_odbc

1801

sybase_odbc

4802

In this test MySQL was run with an 8M index cache.

We have gathered some more benchmark results at http://www.mysql.com/information/benchmarks.html.

Note that Oracle is not included because they asked to be removed. All Oracle benchmarks have to be passed by Oracle! We believe that makes Oracle benchmarks very biased because the preceding benchmarks are supposed to show what a standard installation can do for a single client.

To run the benchmark suite, you have to download a MySQL source distribution, install the Perl DBD driver for the database you want to test, and then do:

cd sql-bench
perl run-all-tests --server=#

where # is one of the supported servers. You can get a list of all options and supported servers by doing run-all-tests --help.

crash-me tries to determine what features a database supports and what its capabilities and limitations are by actually running queries. For example, it determines:

  • What column types are supported

  • How many indexes are supported

  • What functions are supported

  • How big a query can be

  • How big a VARCHAR column can be

We can find the result from crash-me on a lot of different databases at http://www.mysql.com/information/crash-me.php.

Using Your Own Benchmarks

You should definitely benchmark your application and database to find out where the bottlenecks are. By fixing it (or by replacing the bottleneck with a ‘dummy module') you can then easily identify the next bottleneck (and so on). Even if the overall performance for your application is sufficient, you should at least make a plan for each bottleneck, and decide how to solve it if someday you really need the extra performance.

For an example of portable benchmark programs, look at the MySQL benchmark suite. See Section 5.1.4. You can take any program from this suite and modify it to suit your needs. By doing this, you can try different solutions to your problem and test which is really the fastest solution for you.

It is very common that some problems only occur when the system is very heavily loaded. We have had many customers who contact us when they have a (tested) system in production and have encountered load problems. In every one of these cases so far, it has been problems with basic design (table scans are not good at high load) or OS/Library issues. Most of this would be a lot easier to fix if the systems were not already in production.

To avoid problems like this, you should put some effort into benchmarking your whole application under the worst possible load! You can use Super Smack for this. It is available at http://www.mysql.com/Downloads/super-smack/super-smack-1.0.tar.gz. As the name suggests, it can bring your system down to its knees if you ask it, so make sure to use it only on your development systems.

Get MySQL Reference Manual now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.