Chapter 4. The Binary Log
The previous chapter included a very brief introduction to the binary log. In this chapter, we will fill in more details and give a more thorough description of the binary log structure, the replication event format, and how to use the mysqlbinlog tool to investigate and work with the contents of binary logs.
The binary log records changes made to the database. It is usually used for replication, and the binary log then allows the same changes to be made on the slaves as well. Because the binary log normally keeps a record of all changes, you can also use it for auditing purposes to see what happened in the database, and for point-in-time recovery (PITR) by playing back the binary log to a server, repeating changes that were recorded in the binary log. (This is what we did in Reporting, where we played back all changes done between 23:55:00 and 23:59:59.)
The binary log contains information that could change the database. Note that statements that
could potentially change the database are also logged,
even if they don’t actually do change the database. The most notable cases
are those statements that optionally make a change, such as
DROP TABLE IF
CREATE TABLE IF NOT
EXISTS, along with statements such as
UPDATE that have
WHERE conditions that don’t happen to match any rows on the master.
The binary log records each transaction in the order that the commit took place on the master. Although transactions may be interleaved on the master, each appears as an uninterrupted sequence in the binary log, the order determined by the time of the commit.
Structure of the Binary Log
Conceptually, the binary log is a sequence of binary log events (also called binlog events or even just events when there is no risk of confusion). As you saw in Chapter 3, the binary log actually consists of several files, as shown in Figure 4-1, that together form the binary log.
The actual events are stored in a series of files called binlog files
with names in the form
host-bin.000001, accompanied by a binlog index file that is usually
host-bin.index and keeps track of the existing
binlog files. The binlog file that is currently being written to by the
server is called the active binlog file. If no slaves
are lagging, this is also the file that is being read by the slaves. The
names of the binlog files and the binlog index file can be controlled using the
log-bin-index options, which you are familiar with from
Configuring the Master. The options are covered in more
detail later in this chapter.
The index file keeps track of all the binlog files used by the
server so that the server can correctly create new binlog files when
necessary, even after server restarts. Each line in the index file
contains the name of a binlog file that is part of the binary log.
Depending on the MySQL version, it can either be the full name or a name
relative to the data directory. Commands that affect the binlog files,
PURGE BINARY LOGS,
RESET MASTER, and
FLUSH LOGS, also affect the index file by adding
or removing lines to match the files that were added or removed by the
As shown in Figure 4-1, each
binlog file is made up of binlog events, with the
Format_description event serving as the file’s header and the
Rotate event as its footer. Note that a binlog file might not end with a
rotate event if the server was interrupted or crashed.
contains information about the server that wrote the binlog file as well
as some critical information about the file’s status. If the server is
stopped and restarted, it creates a new binlog file and writes a new
Format_description event to it. This is
necessary because changes can potentially occur between bringing a server
down and bringing it up again. For example, the server could be upgraded,
in which case a new
event would have to be written.
When the server has finished writing a binlog file, a
Rotate event is added to end the file. The event
points to the next binlog file in sequence by giving the name of the file
as well as the position to start reading from.
Format_description event and
Rotate event will be described in
detail in the next section.
With the exception of control events (e.g.,
Incident), events of a binlog file are grouped
into units called groups, as seen in
Figure 4-2. In transactional
storage engines, each group is roughly equivalent to a transaction, but
for nontransactional storage engines or statements that cannot be part of
a transaction, such as
ALTER, each statement is a group by
In short, each group of events in the binlog file contains either a single statement not in a transaction or a transaction consisting of several statements.
Each group is executed entirely or not at all (with the exception of a few well-defined cases). If, for some reason, the slave stops in the middle of a group, replication will start from the beginning of the group and not from the last statement executed. Chapter 8 describes in detail how the slave executes events.
Binlog Event Structure
MySQL 5.0 introduced a new binlog format: binlog format 4. The preceding formats were not easy to extend with additional fields if the need were to arise, so binlog format 4 was designed specifically to be extensible. This is still the event format used in every server version since 5.0, even though each version of the server has extended the binlog format with new events and some events with new fields. Binlog format 4 is the event format described in this chapter.
Each binlog event consists of four parts:
- Common header
The common header is—as the name suggests—common to all events in the binlog file.
The common header contains basic information about the event, the most important fields being the event type and the size of the event.
- Post header
The post header is specific to each event type; in other words, each event type stores different information in this field. But the size of this header, just as with the common header, is the same throughout a given binlog file. The size of each event type is given by the
- Event body
After the headers comes the event body, which is the variable-sized part of the event. The size and the end position is listed in the common header for the event. The event body stores the main data of the event, which is different for different event types. For the
Queryevent, for instance, the body stores the query, and for the
User_varevent, the body stores the name and value of a user variable that was just set by a statement.
Starting with MySQL 5.6, there is a checksum at the end of the event, if the server is configured to generate one. The checksum is a 32-bit integer that is used to check that the event has not been corrupted since it was written.
A complete listing of the formats of all events is beyond the
scope of this book, but because the
Rotate events are critical to how the other events are interpreted, we will
briefly cover them here. If you are interested in the details of the
events, you can find them in the MySQL Internals
As already noted, the
Format_description event starts every binlog
file and contains common information about the events in the file. The
result is that the
Format_description event can be different
between different files; this typically occurs when a server is upgraded
and restarted. The
Format_description_log_event contains the
- Binlog file format version
This is the version of the binlog file, which should not be confused with the version of the server. MySQL versions 3.23, 4.0, and 4.1 use version 3 of the binary log, while MySQL versions 5.0 and later use version 4 of the binary log.
The binlog file format version changes when developers make significant changes in the overall structure of the file or the events. In MySQL version 5.0, the start event for a binlog file was changed to use a different format and the common headers for all events were also changed, which prompted the change in the binlog file format version.
- Server version
This is a version string denoting the server that created the file. This includes the version of the server as well as additional information if special builds are made. The format is normally the three-position version number, followed by a hyphen and any additional build options. For example, “5.5.10-debug-log” means debug build version 5.5.10 of the server.
- Common header length
This field stores the length of the common header. Because it’s here in the
Format_description, this length can be different for different binlog files. This holds for all events except the
Rotateevents, which cannot vary.
- Post-header lengths
The post-header length for each event is fixed within a binlog file, and this field stores an array of the post-header length for each event that can occur in the binlog file. Because the number of event types can vary between servers, the number of different event types that the server can produce is stored before this field.
Format_description log events have a fixed
length because the server needs them before it knows the size of the
common header length. When connecting to the server, it first sends a
Format_description event. Because
the length of the common header is stored in the
Format_description event, there is no way
for the server to know what the size of the common header is for the
Rotate event unless it has a fixed
size. So for these two events, the size of the common header is fixed
and will never change between server versions, even if the size of the
common header changes for other events.
Because both the size of the common header and the size of the
post header for each event type are given in the
Format_description event, extending the format
with new events or even increasing the size of the post headers by
adding new fields is supported by this format and will therefore not
require a change in the binlog file format.
With each extension, particular care is taken to ensure that the extension does not affect interpretation of events that were already in earlier versions. For example, the common header can be extended with an additional field to indicate that the event is compressed and the type of compression used, but if this field is missing—which would be the case if a slave is reading events from an old master—the server should still be able to fall back on its default behavior.
Because hardware can fail and software can contain bugs, it is necessary to have some way to ensure that data corrupted by such events is not applied on the slave. Random failures can occur anywhere, and if they occur inside a statement, they often lead to a syntax error causing the slave to stop. However, relying on this to prevent corrupt events from being replicated is a poor way to ensure integrity of events in the binary log. This policy would not catch many types of corruptions, such as in timestamps, nor would it work for row-based events where the data is encoded in binary form and random corruptions are more likely to lead to incorrect data.
To ensure the integrity of each event, MySQL 5.6 introduced replication event checksums. When events are written, a checksum is added, and when the events are read, the checksum is computed for the event and compared against the checksum written with the event. If the checksums do not match, execution can be aborted before any attempt is made to apply the event on the slave. The computation of checksums can potentially impact performance, but benchmarking has demonstrated no noticeable performance degradation from the addition and checking of checksums, so they are enabled by default in MySQL 5.6. They can, however, be turned off if necessary.
In MySQL 5.6, checksums can be generated when changes are written either to the binary log or to the relay log, and verified when reading events back from one of these logs.
Replication event checksums are controlled using three options:
This option enables checksums and tells the server what checksum computation to use. Currently there are only two choices:
CRC32uses ISO-3309 CRC-32 checksums, whereas
NONEturns off checksumming. The default is
CRC32, meaning that checksums are generated.
This option controls whether the master verifies the checksum when reading it from the binary log. This means that the event checksum is verified when it is read from the binary log by the dump thread (see Replication Architecture Basics), but before it is sent out, and also when using
SHOW BINLOG EVENTS. If any of the events shown is corrupt, the command will throw an error. This option is off by default.
If you get a corrupt binary log or relay log, mysqlbinlog
can be used to find the bad checksum using the
--verify-binlog-checksum option. This option causes
mysqlbinlog to verify the checksum of
each event read and stop when a corrupt event is found, which the
following example demonstrates:
client/mysqlbinlog --verify-binlog-checksum master-bin.000001. . . # at 261 #110406 8:35:28 server id 1 end_log_pos 333 CRC32 0xed927ef2... SET TIMESTAMP=1302071728/*!*/; BEGIN /*!*/; # at 333 #110406 8:35:28 server id 1 end_log_pos 365 CRC32 0x01ed254d Intvar SET INSERT_ID=1/*!*/; ERROR: Error in Log_event::read_log_event(): 'Event crc check failed!... DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
Starting with MySQL 5.1, row-based replication is also available, which will be covered in Row-Based Replication.
In statement-based replication, the actual executed statement is written to the binary log together with some execution information, and the statement is re-executed on the slave. Because not all events can be logged as statements, there are some exceptions that you should be aware of. This section will describe the process of logging individual statements as well as the important caveats.
Because the binary log is a common resource—all threads write to
it—it is critical to prevent two threads from updating the binary log at
the same time. To handle this, a lock for the binary log—called the
LOCK_log mutex—is acquired just before each group is
written to the binary log and released just after the group has been
written. Because all session threads for the server can potentially log
transactions to the binary log, it is quite common for several session
threads to block on this lock.
Logging Data Manipulation Language Statements
Data manipulation are usuallylanguage (DML) statements are usually
UPDATE statements. To support logging changes
in a consistent manner, MySQL writes the binary log while
transaction-level locks are held, and releases them after the binary log
has been written.
To ensure that the binary log is updated consistently with the
tables that the statement modifies, each statement is logged to the
binary log during statement commit, just before the table locks are
released. If the logging were not made as part of the statement, another
statement could be “injected” between the changes that the statement
introduces to the database and the logging of the statement to the
binary log. This would mean that the statements would be logged in a
different order than the order in which they took effect in the
database, which could lead to inconsistencies between master and slave.
For instance, an
WHERE clause could update
different rows on the slave because the values in those rows might be
different if the statement order changed.
Logging Data Definition Language Statements
Data definition language (DDL) statements affect a schema, such as
ALTER TABLE statements. These
create or change objects in the filesystem—for example, table definitions are stored in
.frm files and databases are
represented as filesystem directories—so the server keeps information
about these available in internal data structures. To protect the update
of the internal data structure, it is necessary to acquire an internal
LOCK_open) before altering the table
Because a single lock is used to protect these data structures, the creation, alteration, and destruction of database objects can be a considerable source of performance problems. This includes the creation and destruction of temporary tables, which is quite common as a technique to create an intermediate result set to perform computations on.
If you are creating and destroying a lot of temporary tables, it is often possible to boost performance by reducing the creation (and subsequent destruction) of temporary tables.
For statement-based replication, the most common binlog event is the
Query event, which is used to write a
statement executed on the master to the binary log. In addition to the
actual statement executed, the event contains some additional
information necessary to execute the statement.
Recall that the binary log can be used for many purposes and contains statements in a potentially different order from that in which they were executed on the master. In some cases, part of the binary log may be played back to a server to perform PITR, and in some cases, replication may start in the middle of a sequence of events because a backup has been restored on a slave before starting replication.
In all these cases, the events are executing in different contexts (i.e., there is information that is implicit when the server executes the statement but that has to be known to execute the statement correctly). Examples include:
- Current database
- Value of user-defined variable
- Seed for the
RANDfunction is based on a pseudorandom number function, meaning that it can generate a sequence of numbers that are reproducible but appear random in the sense that they are evenly distributed. The function is not really random, but starts from a seed number and applies a pseudorandom function to generate a deterministic sequence of numbers. This means that given the same seed, the
RANDfunction will always return the same number. However, this makes the seed implicit for the statement.
- The current time
Obviously, the time the statement started executing is implicit. Having a correct time is important when calling functions that are dependent on the current time—such as
UNIX_TIMESTAMP—because otherwise they will return different results if there is a delay between the statement execution on the master and on the slave.
- Value used when inserting into an
If a statement inserts a row into a table with a column defined with the
AUTO_INCREMENTattribute, the value used for that row is implicit for the statement because it depends on the rows inserted before it.
- Value returned by a call to
- Thread ID
Because the context for executing the statements cannot be known when they’re replayed—either on a slave or on the master after a crash and restart—it is necessary to make the implicit information explicit by adding it to the binary log. This is done in slightly different ways for different kinds of information.
In addition to the previous list, some information is implicit to the execution of triggers and stored routines, but we will cover that separately in Triggers, Events, and Stored Routines.
Let’s consider each of the cases of implicit information individually, demonstrate the problem with each one, and examine how the server handles it.
The log records the current database by adding it to a special
field of the
Query event. This
field also exists for the events used to handle the
LOAD DATA INFILE
statement, discussed in LOAD DATA INFILE Statements, so the description here
applies to that statement as well. The current database also plays an
important role in filtering on the database and is described later in
Five functions use the current time to compute their values:
SYSDATE. The first four functions return a
value based on the time when the statement
started to execute. In contrast,
SYSDATE returns the value when the function
is executed. The difference can best be demonstrated by comparing the
SYSDATE with an intermediate sleep:
SELECT SYSDATE(), NOW(), SLEEP(2), SYSDATE(), NOW()\G*************************** 1. row *************************** SYSDATE(): 2013-06-08 23:24:08 NOW(): 2013-06-08 23:24:08 SLEEP(2): 0 SYSDATE(): 2013-06-08 23:24:10 NOW(): 2013-06-08 23:24:08 1 row in set (2.00 sec)
Both functions are evaluated when they are encountered, but
NOW returns the time that the
statement started executing, whereas
SYSDATE returns the time when the function
To handle these time functions correctly, the timestamp indicating when the event started executing is stored in the event. This value is then copied from the event to the slave execution thread and used as if it were the time the event started executing when computing the value of the time functions.
SYSDATE gets the time
from the operating system directly, it is not safe for statement-based
replication and will return different values on the master and slave
when executed. So unless you really want to have the actual time
inserted into your tables, it is prudent to stay away from this
Some implicit information is associated with statements that meet certain conditions:
If the statement contains a reference to a user-defined variable (as in Example 4-1), it is necessary to add the value of the user-defined variable to the binary log.
In each of these cases, one or more context
events are added to the binary log before the event
containing the query is written. Because there can be several context
events preceding a
Query event, the
binary log can handle multiple user-defined variables together with
RAND function, or (almost) any
combination of the previously listed conditions. The binary log stores
the necessary context information through the following events:
If the statement contains a call to
LAST_INSERT_ID, this event records the value that this function returned in the statement.
Example 4-2 shows some
statements that generate all of the context events and how the events
appear when displayed using
EVENTS. Note that there can be several context events before
SET @foo = 12;Query OK, 0 rows affected (0.00 sec) master>
SET @bar = 'Smoothnoodlemaps';Query OK, 0 rows affected (0.00 sec) master>
INSERT INTO t1(b,c) VALUES->
(@foo,@bar), (RAND(), 'random');Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 master>
INSERT INTO t1(b) VALUES (LAST_INSERT_ID());Query OK, 1 row affected (0.00 sec) master>
SHOW BINLOG EVENTS FROM 238\G*************************** 1. row *************************** Log_name: mysqld1-bin.000001 Pos: 238 Event_type: Query Server_id: 1 End_log_pos: 306 Info: BEGIN *************************** 2. row *************************** Log_name: mysqld1-bin.000001 Pos: 306 Event_type: Intvar Server_id: 1 End_log_pos: 334 Info: INSERT_ID=1 *************************** 3. row *************************** Log_name: mysqld1-bin.000001 Pos: 334 Event_type: RAND Server_id: 1 End_log_pos: 369 Info: rand_seed1=952494611,rand_seed2=949641547 *************************** 4. row *************************** Log_name: mysqld1-bin.000001 Pos: 369 Event_type: User var Server_id: 1 End_log_pos: 413 Info: @`foo`=12 *************************** 5. row *************************** Log_name: mysqld1-bin.000001 Pos: 413 Event_type: User var Server_id: 1 End_log_pos: 465 Info: @`bar`=_utf8 0x536D6F6F74686E6F6F6... *************************** 6. row *************************** Log_name: mysqld1-bin.000001 Pos: 465 Event_type: Query Server_id: 1 End_log_pos: 586 Info: use `test`; INSERT INTO t1(b,c) VALUES (@foo,@bar)... *************************** 7. row *************************** Log_name: mysqld1-bin.000001 Pos: 586 Event_type: Xid Server_id: 1 End_log_pos: 613 Info: COMMIT /* xid=44 */ *************************** 8. row *************************** Log_name: mysqld1-bin.000001 Pos: 613 Event_type: Query Server_id: 1 End_log_pos: 681 Info: BEGIN *************************** 9. row *************************** Log_name: mysqld1-bin.000001 Pos: 681 Event_type: Intvar Server_id: 1 End_log_pos: 709 Info: LAST_INSERT_ID=1 *************************** 10. row *************************** Log_name: mysqld1-bin.000001 Pos: 709 Event_type: Intvar Server_id: 1 End_log_pos: 737 Info: INSERT_ID=3 *************************** 11. row *************************** Log_name: mysqld1-bin.000001 Pos: 737 Event_type: Query Server_id: 1 End_log_pos: 843 Info: use `test`; INSERT INTO t1(b) VALUES (LAST_INSERT_ID()) *************************** 12. row *************************** Log_name: mysqld1-bin.000001 Pos: 843 Event_type: Xid Server_id: 1 End_log_pos: 870 Info: COMMIT /* xid=45 */ 12 rows in set (0.00 sec)
The last implicit piece of information that the binary log
sometimes needs is the thread ID of the MySQL session handling the
statement. The thread ID is necessary when a function is dependent on
the thread ID—such as when it refers to
CONNECTION_ID—but most importantly for handling temporary tables.
Temporary tables are specific to each thread, meaning that two temporary tables with the same name are allowed to coexist, provided they are defined in different sessions. Temporary tables can provide an effective means to improve the performance of certain operations, but they require special handling to work with the binary log.
Internally in the server, temporary tables are handled by creating obscure names for storing the table definitions. The names are based on the process ID of the server, the thread ID that creates the table, and a thread-specific counter to distinguish between different instances of the table from the same thread. This naming scheme allows tables from different threads to be distinguished from each other, but each statement can access its proper table only if the thread ID is stored in the binary log.
Similar to how the current database is handled in the binary
log, the thread ID is stored as a separate field in every
Query event and can therefore be used to compute thread-specific data
and handle temporary tables correctly.
When writing the
the thread ID to store in the event is read from the
pseudo_thread_id server variable. This means that it can be set before executing
a statement, but only if you have
SUPER privileges. This server variable is
intended to be used by mysqlbinlog
to emit statements correctly and should not normally be used.
For a statement that contains a call to the
CONNECTION_ID function or that uses or
creates a temporary table, the
Query event is marked as thread-specific in
the binary log. Because the thread ID is always present in the
Query event, this flag is not
necessary but is mainly used to allow mysqlbinlog to avoid printing unnecessary
assignments to the
LOAD DATA INFILE Statements
LOAD DATA INFILE statement
makes it easy to fill tables quickly from a file. Unfortunately,
it is dependent on a certain kind of context that cannot be covered by
the context events we have discussed: files that need to be read from
LOAD DATA INFILE, the
MySQL server uses a special set of events to handle the transfer of the
file using the binary log. In addition to solving the problem for
LOAD DATA INFILE, this makes the
statement a very convenient tool for transferring large amounts of data
from the master to the slave, as you will see soon. To correctly
transfer and execute a
INFILE statement, several new events are introduced into the
Even though the statement contained in this event contains the name of the file that was used on the master, this file will not be sought by the slave. Instead, the contents provided by the preceding
Append_blockevents will be used.
LOAD DATA INFILE
statement executed on the master, the file to read is mapped to an
internal file-backed buffer, which is used in the following processing.
In addition, a unique file ID is assigned to the execution of the statement and
is used to refer to the file read by the statement.
While the statement is executing, the file contents are written to
the binary log as a sequence of events starting with a
Begin_load_query event—which indicates the
beginning of a new file—followed by zero or more
Append_block events. Each event written to the
binary log is no larger than the maximum allowed packet size, as
specified by the
After the entire file is read and applied to the table, the
execution of the statement terminates by writing the
Execute_load_query event to the binary log.
This event contains the statement executed together with the file ID
assigned to the execution of the statement. Note that the statement is
not the original statement as the user wrote it, but rather a recreated version of the statement.
If you are reading an old binary log, you might instead find
log_event. These were the events used to
LOAD DATA INFILE prior to
MySQL version 5.0.3 and were replaced by the implementation just
Example 4-3 shows
the events written to the binary log by a successful execution of a
LOAD DATA INFILE statement. In the
Info field, you can see the assigned
file ID—1, in this case—and see that it is used for all the events that
are part of the execution of the statement. You can also see that the
file foo.dat used by the statement
contains more than the maximum allowed packet size of 16384, so it is
split into three events.
SHOW BINLOG EVENTS IN 'master-bin.000042' FROM 269\G*************************** 1. row *************************** Log_name: master-bin.000042 Pos: 269 Event_type: Begin_load_query Server_id: 1 End_log_pos: 16676 Info: ;file_id=1;block_len=16384 *************************** 2. row *************************** Log_name: master-bin.000042 Pos: 16676 Event_type: Append_block Server_id: 1 End_log_pos: 33083 Info: ;file_id=1;block_len=16384 *************************** 3. row *************************** Log_name: master-bin.000042 Pos: 33083 Event_type: Append_block Server_id: 1 End_log_pos: 33633 Info: ;file_id=1;block_len=527 *************************** 4. row *************************** Log_name: master-bin.000042 Pos: 33633 Event_type: Execute_load_query Server_id: 1 End_log_pos: 33756 Info: use `test`; LOAD DATA INFILE 'foo.dat' INTO...;file_id=1 4 rows in set (0.00 sec)
Binary Log Filters
It is possible to filter out statements from the binary log using two
binlog-ignore-db (which we will call
binlog-*-db, collectively). The
binlog-do-db option is used when you want to filter
only statements belonging to a certain database, and
binlog-ignore-db is used when you want to ignore a
certain database but replicate all other databases.
These options can be given multiple times, so to filter out both
one_db and the database
two_db, you must give both options in
the my.cnf file. For
[mysqld] binlog-ignore-db=one_db binlog-ignore-db=two_db
The way MySQL filters events can be quite a surprise to unfamiliar users, so we’ll explain how filtering works and make some recommendations on how to avoid some of the major headaches.
Figure 4-3 shows
how MySQL determines whether the statement is filtered. The filtering is
done on a statement level—either the entire statement is filtered out or
the entire statement is written to the binary log—and the
binlog-*-db options use the current database to decide
whether the statement should be filtered, not the database of the tables
affected by the statement.
To help you understand the behavior, consider the statements in
Example 4-4. Each line uses
bad as the current database and changes tables
in different databases.
Now, given these statements, consider what happens if the
bad database is filtered using
binlog-ignore-db=bad. None of the three statements in
Example 4-4 will be written to the
binary log, even though the second and third statements change tables on
ugly database and make no reference to the
bad database. This might seem strange
at first—why not filter the statement based on the database of the table
changed? But consider what would happen with the third statement if the
ugly database was filtered instead of
bad database. Now one database in
UPDATE is filtered out and the
other isn’t. This puts the server in a catch-22 situation, so the
problem is solved by just filtering on the current database, and this
rule is used for all statements (with a few exceptions).
To avoid mistakes when executing statements that can potentially
be filtered out, make it a habit not to write statements so they
qualify table, function, or procedure names with the database name.
Instead, whenever you want to access a table in a different
database, issue a
statement to make that database the current database. In other words,
instead of writing:
INSERT INTO other.book VALUES ('MySQL', 'Paul DuBois');
USE other; INSERT INTO book VALUES ('MySQL', 'Paul DuBois');
Using this practice, it is easy to see by inspection that the statement does not update multiple databases simply because no tables should be qualified with a database name.
This behavior does not apply when row-based replication is used. The filtering employed in row-based replication will be discussed in Filtering in Row-Based Replication, but since row-based replication can work with each individual row change, it is able to filter on the actual table that the row is targeted for and does not use the current database.
So, what happens when both
binlog-ignore-db are used at the same time? For
example, consider a configuration file containing the following two
[mysqld] binlog-do-db=good binlog-ignore-db=bad
In this case, will the following statement be filtered out or not?
USE ugly; INSERT INTO t1 VALUES (1);
Following the diagram in Figure 4-3, you can see that if
there is at least a
binlog-do-db rule, all
binlog-ignore-db rules are ignored completely, and
since only the
good database is
included, the previous statement will be filtered out.
Because of the way that the
are evaluated, it is pointless to have both
rules at the same time. Since the binary log can be used for recovery
as well as replication, the recommendation is not
to use the
binlog-*-db options and instead filter out
the event on the slave by using
replicate-* options (these are described in
Filtering and skipping events). Using the
binlog-*-db option would filter out statements from
the binary log and you will not be able to restore the database from
the binary log in the event of a crash.
Triggers, Events, and Stored Routines
A few other constructions that are treated specially when logged are stored programs—that is, triggers, events, and stored routines (the last is a collective name for stored procedures and stored functions). Their treatment with respect to the binary log contains some elements in common, so they will be covered together in this section. The explanation distinguishes statements of two types: statements that define, destroy, or alter stored programs and statements that invoke them.
Statements that define or destroy stored programs
The following discussion shows triggers in the examples, but the same principles apply to definition of events and stored routines. To understand why the server needs to handle these features specially when writing them to the binary log, consider the code in Example 4-5.
In the example, a table named
information about all employees of an imagined system and a table
log keeps a log of interesting information.
Note that the
log table has a
timestamp column that notes the time of a change and that the
name column in the
employee table is the primary key for the table.
There is also a
status column to
tell whether the addition succeeded or failed.
To track information about employee information changes—for example, for auditing purposes—three triggers are created so that whenever an employee is added, removed, or changed, a log entry of the change is added to a log table.
Notice that the triggers are after triggers, which means that entries are added only if the executed statement is successful. Failed statements will not be logged. We will later extend the example so that unsuccessful attempts are also logged.
CREATE TABLE employee ( name CHAR(64) NOT NULL, email CHAR(64), password CHAR(64), PRIMARY KEY (name) ); CREATE TABLE log ( id INT AUTO_INCREMENT, email CHAR(64), status CHAR(10), message TEXT, ts TIMESTAMP, PRIMARY KEY (id) ); CREATE TRIGGER tr_employee_insert_after AFTER INSERT ON employee FOR EACH ROW INSERT INTO log(email, status, message) VALUES (NEW.email, 'OK', CONCAT('Adding employee ', NEW.name)); CREATE TRIGGER tr_employee_delete_after AFTER DELETE ON employee FOR EACH ROW INSERT INTO log(email, status, message) VALUES (OLD.email, 'OK', 'Removing employee'); delimiter $$ CREATE TRIGGER tr_employee_update_after AFTER UPDATE ON employee FOR EACH ROW BEGIN IF OLD.name != NEW.name THEN INSERT INTO log(email, status, message) VALUES (OLD.email, 'OK', CONCAT('Name change from ', OLD.name, ' to ', NEW.name)); END IF; IF OLD.password != NEW.password THEN INSERT INTO log(email, status, message) VALUES (OLD.email, 'OK', 'Password change'); END IF; IF OLD.email != NEW.email THEN INSERT INTO log(email, status, message) VALUES (OLD.email, 'OK', CONCAT('E-mail change to ', NEW.email)); END IF; END $$ delimiter ;
With these trigger definitions, it is now possible to add and
remove employees as shown in Example 4-6. Here an
employee is added, modified, and removed, and as you can see, each of
the operations is logged to the
The operations of adding, removing, and modifying employees may
be done by a user who has access to the
table, but what about access to the
log table? In
this case, a user who can manipulate the
table should not be able to make changes to the
log table. There are many reasons for this, but
they all boil down to trusting the contents of the
log table for purposes of maintenance, auditing,
disclosure to legal authorities, and so on. So the DBA may choose to
make access to the
employee table available to many
users while keeping access to the
log table very
SET @pass = PASSWORD('xyzzy');Query OK, 0 rows affected (0.00 sec) master>
INSERT INTO employee VALUES ('mats', 'firstname.lastname@example.org', @pass);Query OK, 1 row affected (0.00 sec) master>
UPDATE employee SET name = 'matz'->
WHERE email = 'email@example.com';Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 master>
SET @pass = PASSWORD('foobar');Query OK, 0 rows affected (0.00 sec) master>
UPDATE employee SET password = @pass->
WHERE email = 'firstname.lastname@example.org';Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 master>
DELETE FROM employee WHERE email = 'email@example.com';Query OK, 1 row affected (0.00 sec) master>
SELECT * FROM log;+----+------------------+-------------------------------+---------------------+ | id | email | message | ts | +----+------------------+-------------------------------+---------------------+ | 1 | firstname.lastname@example.org | Adding employee mats | 2012-11-14 18:56:08 | | 2 | email@example.com | Name change from mats to matz | 2012-11-14 18:56:11 | | 3 | firstname.lastname@example.org | Password change | 2012-11-14 18:56:41 | | 4 | email@example.com | Removing employee | 2012-11-14 18:57:11 | +----+------------------+-------------------------------+---------------------+ 4 rows in set (0.00 sec)
DELETE in the example can generate a warning that it is unsafe to
log when statement mode is used. This is because it invokes a
trigger that inserts into an autoincrement column. In general, such
warnings should be investigated.
To make sure the triggers can execute successfully against a
highly protected table, they are executed as the user who defined the
trigger, not as the user who changed the contents of the
employee table. So the
TRIGGER statements in Example 4-5 are executed by
the DBA, who has privileges to make additions to the
log table, whereas the statements altering employee
information in Example 4-6 are executed
through a user management account that only has privileges to change
When the statements in Example 4-6 are executed,
the employee management account is used for updating entries in the
employee table, but the DBA privileges are used to
make additions to the
log table. The employee
management account cannot be used to add or remove entries from the
As an aside, Example 4-6 assigns passwords to a user variable before using them in the statement. This is done to avoid sending sensitive data in plain text to another server.
If you recall the earlier discussion about implicit information, you may already have noticed that both the user executing a line of code and the user who defines a trigger are implicit. As you will see in Chapter 8, neither the definer nor the invoker of the trigger is critical to executing the trigger on the slave, and the user information is effectively ignored when the slave executes the statement. However, the information is important when the binary log is played back to a server—for instance, when doing a PITR.
To play back a binary log to the server without problems in
handling privileges on all the various tables, it is necessary to
execute all the statements as a user with
privileges. But the triggers may not have been defined using
SUPER privileges, so it is important to
recreate the triggers with the correct user as the trigger’s definer.
If a trigger is defined with
privileges instead of by the user who defined the trigger originally,
it might cause a privilege escalation.
To permit a DBA to specify the user under which to execute a
CREATE TRIGGER syntax includes an
clause. If a
DEFINER is not given
to the statements—as is the case in Example 4-7—the statement
will be rewritten for the binary log to add a
DEFINER clause and use the current user as
the definer. This means that the definition of the insert trigger
appears in the binary log, as shown in Example 4-7. It lists the
account that created the trigger (
root@localhost) as the definer, which is
what we want in this case.
SHOW BINLOG EVENTS FROM 92236 LIMIT 1\G*************************** 1. row *************************** Log_name: master-bin.000038 Pos: 92236 Event_type: Query Server_id: 1 End_log_pos: 92491 Info: use `test`; CREATE DEFINER=`root`@`localhost` TRIGGER ... 1 row in set (0.00 sec)
Statements that invoke triggers and stored routines
Moving over from definitions to invocations, we can ask how the master’s triggers are handled during replication. Well, actually they’re not handled at all.
The statement that invokes the trigger is logged to the binary log, but it is not linked to the particular trigger. Instead, when the slave executes the statement, it automatically executes any triggers associated with the tables affected by the statement. This means that there can be different triggers on the master and the slave, and the triggers on the master will be invoked on the master while the triggers on the slave will be invoked on the slave. For example, if the trigger to add entries to the log table is not necessary on the slave, performance can be improved by eliminating the trigger from the slave.
Still, any context events necessary for replicating correctly
will be written to the binary log before the statement that invokes
the trigger, even if it is just the statements in the trigger that
require the context events. Thus, Example 4-8 shows the binary
log after executing the
statement in Example 4-5. Note that the
first event writes the
log table’s primary key. This reflects the
use of the
log table in the trigger, but it might
appear to be redundant because the slave will not use the
You should, however, note that using different triggers on the
master and slave—or no trigger at all on either the master or slave—is
the exception and that the
ID is necessary for replicating the
INSERT statement correctly when the trigger
is both on the master and slave.
SHOW BINLOG EVENTS FROM 93340\G*************************** 1. row *************************** Log_name: master-bin.000038 Pos: 93340 Event_type: Intvar Server_id: 1 End_log_pos: 93368 Info: INSERT_ID=1 *************************** 2. row *************************** Log_name: master-bin.000038 Pos: 93368 Event_type: User var Server_id: 1 End_log_pos: 93396h Info: @`pass`=_utf8 0x2A3942353030333433424335324... utf8_general_ci *************************** 3. row *************************** Log_name: master-bin.000038 Pos: 93396 Event_type: Query Server_id: 1 End_log_pos: 93537 Info: use `test`; INSERT INTO employee VALUES ... 3 rows in set (0.00 sec)
Stored functions and stored procedures are known by the common name stored routines. Since the server treats stored procedures and stored functions very differently, stored procedures will be covered in this section and stored functions in the next section.
The situation for stored routines is similar to triggers in some
aspects, but very different in others. Like triggers, stored routines
and it must be explicitly added to the binary log whether or not the
statement includes it. But the invocation of stored routines is handled
differently from triggers.
To begin, let’s extend Example 4-6, which defines
tables for employees and logs, with some utility routines to work with
the employees. Even though this can be handled with standard
UPDATE statements, we’ll use stored procedures
to demonstrate some issues involved in writing them to the binary log.
For these purposes, let’s extend the example with the functions in Example 4-9 for adding and
delimiter $$ CREATE PROCEDURE employee_add(p_name CHAR(64), p_email CHAR(64), p_password CHAR(64)) MODIFIES SQL DATA BEGIN DECLARE l_pass CHAR(64); SET l_pass = PASSWORD(p_password); INSERT INTO employee(name, email, password) VALUES (p_name, p_email, l_pass); END $$ CREATE PROCEDURE employee_passwd(p_email CHAR(64), p_password CHAR(64)) MODIFIES SQL DATA BEGIN DECLARE l_pass CHAR(64); SET l_pass = PASSWORD(p_password); UPDATE employee SET password = l_pass WHERE email = p_email; END $$ CREATE PROCEDURE employee_del(p_name CHAR(64)) MODIFIES SQL DATA BEGIN DELETE FROM employee WHERE name = p_name; END $$ delimiter ;
employee_passwd procedures, we have
extracted the encrypted password into a separate variable for the
reasons already explained, but the
employee_del procedure just contains a
DELETE statement, since nothing else
is needed. A binlog entry corresponding to one function is:
SHOW BINLOG EVENTS FROM 97911 LIMIT 1\G*************************** 1. row *************************** Log_name: master-bin.000038 Pos: 97911 Event_type: Query Server_id: 1 End_log_pos: 98275 Info: use `test`; CREATE DEFINER=`root`@`localhost`PROCEDURE ... 1 row in set (0.00 sec)
As expected, the definition of this procedure is extended with the
DEFINER clause before writing the definition
to the binary log, but apart from that, the body of the procedure is
left intact. Notice that the
CREATE PROCEDURE statement is
replicated as a
Query event, as are
all DDL statements.
In this regard, stored routines are similar to triggers in the way they are treated by the binary log. But invocation differs significantly from triggers. Example 4-10 calls the procedure that adds an employee and shows the resulting contents of the binary log.
CALL employee_add('chuck', 'firstname.lastname@example.org', 'abrakadabra');Query OK, 1 row affected (0.00 sec) master>
SHOW BINLOG EVENTS FROM 104033\G*************************** 1. row *************************** Log_name: master-bin.000038 Pos: 104033 Event_type: Intvar Server_id: 1 End_log_pos: 104061 Info: INSERT_ID=1 *************************** 2. row *************************** Log_name: master-bin.000038 Pos: 104061 Event_type: Query Server_id: 1 End_log_pos: 104416 Info: use `test`; INSERT INTO employee(name, email, password) VALUES ( NAME_CONST('p_name',_utf8'chuck' COLLATE ...), NAME_CONST('p_email',email@example.com' COLLATE ...), NAME_CONST('pass',_utf8'*FEB349C4FDAA307A...' COLLATE ...)) 2 rows in set (0.00 sec)
In Example 4-10, there are four things that you should note:
CALLstatement is not written to the binary log. Instead, the statements executed as a result of the call are written to the binary log. In other words, the body of the stored procedure is unrolled into the binary log.
The statement is rewritten to not contain any references to the parameters of the stored procedure—that is,
p_password. Instead, the
NAME_CONSTfunction is used for each parameter to create a result set with a single value.
The locally declared variable
passis also replaced with a
NAME_CONSTexpression, where the second parameter contains the encrypted password.
Just as when a statement that invokes a trigger is written to the binary log, the statement that calls the stored procedure is preceded by an
Intvarevent holding the insert ID used when adding the employee to the
Since neither the parameter names nor the locally declared names
are available outside the stored routine,
NAME_CONST is used to associate the name of
the parameter or local variable with the constant value used when
executing the function. This guarantees that the value can be used in
the same way as the parameter or local variable. However, this change is
not significant; currently it offers no advantages over using the
Stored functions share many similarities with stored procedures and some
similarities with triggers. Similar to both stored procedures and
triggers, stored functions have a
that is normally (but not always) used when the
statement is written to the binary log.
In contrast to stored procedures, stored functions can return scalar values and you can therefore embed them in various places in SQL statements. For example, consider the definition of a stored routine in Example 4-11, which extracts the email address of an employee given the employee’s name. The function is a little contrived—it is significantly more efficient to just execute statements directly—but it suits our purposes well.
delimiter $$ CREATE FUNCTION employee_email(p_name CHAR(64)) RETURNS CHAR(64) DETERMINISTIC BEGIN DECLARE l_email CHAR(64); SELECT email INTO l_email FROM employee WHERE name = p_name; RETURN l_email; END $$ delimiter ;
This stored function can be used conveniently in other statements,
as shown in Example 4-12. In
contrast to stored procedures, stored functions have to specify a characteristic—such as
NO SQL, or
DATA—if they are to be written to the binary log.
CREATE TABLE collected (->
);Query OK, 0 rows affected (0.09 sec) master>
INSERT INTO collected(name, email)->
VALUES ('chuck', employee_email('chuck'));Query OK, 1 row affected (0.01 sec) master>
SELECT employee_email('chuck');+-------------------------+ | employee_email('chuck') | +-------------------------+ | firstname.lastname@example.org | +-------------------------+ 1 row in set (0.00 sec)
When it comes to calls, stored functions are replicated in the
same manner as triggers: as part of the statement that executes the
function. For instance, the binary log doesn’t need any events preceding
INSERT statement in Example 4-12, but it will contain
the context events necessary to replicate the stored function inside the
SELECT statements are not written to the binary log since they don’t change
any data, but a
SELECT containing a
stored function, as in Example 4-13, is an
CREATE TABLE log(log_id INT AUTO_INCREMENT PRIMARY KEY, msg TEXT); delimiter $$ CREATE FUNCTION log_message(msg TEXT) RETURNS INT DETERMINISTIC BEGIN INSERT INTO log(msg) VALUES(msg); RETURN LAST_INSERT_ID(); END $$ delimiter ; SELECT log_message('Just a test');
When executing the stored function, the server notices that it
adds a row to the
log table and marks the statement
as an “updating” statement, which means that it will be written to the
binary log. So, for the slightly artificial example in Example 4-13, the binary log will
contain the event:
*************************** 7. row *************************** Log_name: mysql-bin.000001 Pos: 845 Event_type: Query Server_id: 1 End_log_pos: 913 Info: BEGIN *************************** 8. row *************************** Log_name: mysql-bin.000001 Pos: 913 Event_type: Intvar Server_id: 1 End_log_pos: 941 Info: LAST_INSERT_ID=1 *************************** 9. row *************************** Log_name: mysql-bin.000001 Pos: 941 Event_type: Intvar Server_id: 1 End_log_pos: 969 Info: INSERT_ID=1 *************************** 10. row *************************** Log_name: mysql-bin.000001 Pos: 969 Event_type: Query Server_id: 1 End_log_pos: 1109 Info: use `test`; SELECT `test`.`log_message`(_utf8'Just a test' COLLATE... *************************** 11. row *************************** Log_name: mysql-bin.000001 Pos: 1105 Event_type: Xid Server_id: 1 End_log_pos: 1132 Info: COMMIT /* xid=237 */
The events feature is a MySQL extension, not part of standard SQL. Events, which should not be confused with binlog events, are handled by a stored program that is executed regularly by a special event scheduler.
Similar to all other stored programs, definitions of events are
also logged with a
Since events are invoked by the event scheduler, they are always
executed as the definer and do not pose a security risk in the way that
stored functions do.
When events are executed, the statements are written to the binary log directly.
Since the events will be executed on the master, they are automatically disabled on the slave and will therefore not be executed there. If the events were not disabled, they would be executed twice on the slave: once by the master executing the event and replicating the changes to the slave, and once by the slave executing the event directly.
Because the events are disabled on the slave, it is necessary to enable these events if the slave, for some reason, should lose the master.
So, for example, when promoting a slave as described in Chapter 5, don’t forget to enable the events that were replicated from the master. This is easiest to do using the following statement:
UPDATE mysql.events SET Status = ENABLED WHERE Status = SLAVESIDE_DISABLED;
The purpose of the check is to enable only the events that were disabled when being replicated from the master. There might be events that are disabled for other reasons.
Even though statement-based replication is normally straightforward, some special constructions have to be handled with care. Recall that for the statement to be executed correctly on the slave, the context has to be correct for the statement. Even though the context events discussed earlier handle part of the context, some constructions have additional context that is not transferred as part of the replication process.
The LOAD_FILE function
allows you to fetch a file and use it as part of an
expression. Although quite convenient at times, the file has to exist
on the slave server to replicate correctly since the file is
not transferred during replication, as the file
LOAD DATA INFILE is. With some
ingenuity, you can rewrite a statement involving the
LOAD_FILE function either to use the
LOAD DATA INFILE statement
or to define a user-defined variable to hold the
contents of the file. For example, take the following statement that
inserts a document into a table:
INSERT INTO document(author, body)->
VALUES ('Mats Kindahl', LOAD_FILE('go_intro.xml'));
You can rewrite this statement to use
LOAD DATA INFILE instead. In this case, you
have to take care to specify character strings that cannot exist in
the document as field and line delimiters, since you are going to read
the entire file contents as a single column.
LOAD DATA INFILE 'go_intro.xml' INTO TABLE document->
FIELDS TERMINATED BY '@*@' LINES TERMINATED BY '&%&'->
(author, body) SET author = 'Mats Kindahl';
An alternative is to store the file contents in a user-defined variable and then use it in the statement.
SET @document = LOAD_FILE('go_intro.xml');master>
INSERT INTO document(author, body) VALUES->
('Mats Kindahl, @document);
Nontransactional Changes and Error Handling
So far we have considered only transactional changes and have not looked at error handling at all. For transactional changes, error handling is pretty uncomplicated: a statement that tries to change transactional tables and fails will not have any effect at all on the table. That’s the entire point of having a transactional system—so the changes that the statement attempts to introduce can be safely ignored. The same applies to transactions that are rolled back: they have no effect on the tables and can therefore simply be discarded without risking inconsistencies between the master and the slave.
A specialty of MySQL is the provisioning of nontransactional storage engines. This can offer some speed advantages, because the storage engine does not have to administer the transactional log that the transactional engines use, and it allows some optimizations on disk access. From a replication perspective, however, nontransactional engines require special considerations.
The most important aspect to note is that replication cannot handle arbitrary nontransactional engines, but has to make some assumptions about how they behave. Some of those limitations are lifted with the introduction of row-based replication in version 5.1—a subject that will be covered in Row-Based Replication—but even in that case, it cannot handle arbitrary storage engines.
One of the features that complicates the issue further, from a replication perspective, is that it is possible to mix transactional and nontransactional engines in the same transaction, and even in the same statement.
To continue with the example used earlier, consider Example 4-14, where the
log table from Example 4-5 is given a
nontransactional storage engine while the employee
table is given a transactional one. We use the nontransactional MyISAM storage engine for the
log table to improve its speed, while keeping the
transactional behavior for the
We can further extend the example to track unsuccessful attempts
to add employees by creating a pair of insert triggers: a before trigger
and an after trigger. If an administrator sees an entry in the log with
status field of
FAIL, it means the before trigger ran, but the
after trigger did not, and therefore an attempt to add an employee
CREATE TABLE employee ( name CHAR(64) NOT NULL, email CHAR(64), password CHAR(64), PRIMARY KEY (email) ) ENGINE = InnoDB; CREATE TABLE log ( id INT AUTO_INCREMENT, email CHAR(64), message TEXT, status ENUM('FAIL', 'OK') DEFAULT 'FAIL', ts TIMESTAMP, PRIMARY KEY (id) ) ENGINE = MyISAM; delimiter $$ CREATE TRIGGER tr_employee_insert_before BEFORE INSERT ON employee FOR EACH ROW BEGIN INSERT INTO log(email, message) VALUES (NEW.email, CONCAT('Adding employee ', NEW.name)); SET @LAST_INSERT_ID = LAST_INSERT_ID(); END $$ delimiter ; CREATE TRIGGER tr_employee_insert_after AFTER INSERT ON employee FOR EACH ROW UPDATE log SET status = 'OK' WHERE id = @LAST_INSERT_ID;
What are the effects of this change on the binary log?
To begin, let’s consider the
from Example 4-6.
Assuming the statement is not inside a transaction and
AUTOCOMMIT is 1, the statement will be a
transaction by itself. If the statement executes without errors,
everything will proceed as planned and the statement will be written to
the binary log as a
Now, consider what happens if the
INSERT is repeated with the same employee.
Let’s have a look…
SET @pass = PASSWORD('xyzzy');Query OK, 0 rows affected (0.00 sec) master>
INSERT INTO employee(name,email,password)->
VALUES ('chuck','email@example.com',@pass);ERROR 1062 (23000): Duplicate entry 'firstname.lastname@example.org' for key 'PRIMARY' master>
SELECT * FROM employee;+------+--------------------+-------------------------------------------+ | name | email | password | +------+--------------------+-------------------------------------------+ | chuck | email@example.com | *151AF6B8C3A6AA09CFCCBD34601F2D309ED54888 | +------+--------------------+-------------------------------------------+ 1 row in set (0.00 sec) master>
SHOW BINLOG EVENTS FROM 38493\G*************************** 1. row *************************** Log_name: master-bin.000038 Pos: 38493 Event_type: User var Server_id: 1 End_log_pos: 38571 Info: @`pass`=_utf8 0x2A31353141463642384333413641413... *************************** 2. row *************************** Log_name: master-bin.000038 Pos: 38571 Event_type: Query Server_id: 1 End_log_pos: 38689 Info: use `test`; INSERT INTO employee(name,email,password)... 2 rows in set (0.00 sec)
As you can see, the statement is written to the binary log even
employee table is transactional and the
statement failed. Looking at the contents of the table using the
reveals that there is still a single employee, proving the statement was
rolled back—so why is the statement written to the binary log?
Looking into the
log table will reveal the
SELECT * FROM log;+----+------------------+------------------------+--------+---------------------+ | id | email | message | status | ts | +----+------------------+------------------------+--------+---------------------+ | 1 | firstname.lastname@example.org | Adding employee mats | OK | 2010-01-13 15:50:45 | | 2 | email@example.com | Name change from ... | OK | 2010-01-13 15:50:48 | | 3 | firstname.lastname@example.org | Password change | OK | 2010-01-13 15:50:50 | | 4 | email@example.com | Removing employee | OK | 2010-01-13 15:50:52 | | 5 | firstname.lastname@example.org | Adding employee mats | OK | 2010-01-13 16:11:45 | | 6 | email@example.com | Adding employee mats | FAIL | 2010-01-13 16:12:00 | +----+--------------+----------------------------+--------+---------------------+ 6 rows in set (0.00 sec)
Look at the last line, where the status is
FAIL. This line was added to the table by the
tr_employee_insert_before. For the binary log
to faithfully represent the changes made to the database on the master,
it is necessary to write the statement to the binary log if there are
any nontransactional changes present in the statement or in triggers
that are executed as a result of executing the statement. Since the
statement failed, the after trigger
tr_employee_insert_after was not executed, and
therefore the status is still
from the execution of the before trigger.
Since the statement failed on the master, information about the
failure needs to be written to the binary log as well. The MySQL server
handles this by using an error code field in the
Query event to register the exact error code
that caused the statement to fail. This field is then written to the
binary log together with the event.
You have now seen how individual statements are written to the binary log, along with context information, but we did not cover how transactions are logged. In this section, we will briefly cover how transactions are logged.
A transaction can start under a few different circumstances:
AUTOCOMMIT=1and a statement accessing a transactional table starts to execute. Note that a statement that writes only to nontransactional tables—for example, only to MyISAM tables—does not start a transaction.
Not every statement that is executed after the transaction has started is part of that transaction. The exceptions require special care from the binary log.
Nontransactional statements are by their very definition not part of the transaction. When they are executed, they take effect immediately and do not wait for the transaction to commit. This also means that it is not possible to roll them back. They don’t affect an open transaction: any transactional statement executed after the nontransactional statement is still added to the currently open transaction.
- Statements that write files
Most DDL statements (
ALTER, etc.), with some exceptions, do an implicit commit of any outstanding transaction before starting to execute and an implicit commit after they have finished. These statements modify files in the filesystem and are for that reason not transactional.
- Statements that modify tables in the mysql database
All statements that create, drop, or modify user accounts or privileges for users do an implicit commit and cannot be part of a transaction. Internally, these statements modify tables in the mysql database, which are all nontransactional.
In MySQL versions earlier than 5.1.3, these statements did not cause an implicit commit, but because they were writing to nontransactional tables, they were treated as nontransactional statements. As you will soon see, this caused some inconsistencies, so implicit commits were added for these statements over the course of several versions.
- Statements that require implicit commits for pragmatic reasons
Statements that lock tables, statements that are used for administrative purposes, and
LOAD DATA INFILEcause implicit commits in various situations because the implementation requires this to make them work correctly.
Statements that cause an implicit commit are clearly not part of any transaction, because any open transaction is committed before execution starts. You can find a complete list of statements that cause an implicit commit in the online MySQL Reference Manual.
The binary log can have statements in a different order from their actual execution, because it combines all the statements in each transaction to keep them together. Multiple sessions can execute simultaneous transactions on a server, and the transactional storage engines maintain their own transactional logs to make sure each transaction executes correctly. These logs are not visible to the user. In contrast, the binary log shows all transactions from all sessions in the order in which they were committed as if each executed sequentially.
To ensure each transaction is written as a unit to the binary log, the server has to separate statements that are executing in different threads. When committing a transaction, the server writes all the statements that are part of the transaction to the binary log as a single unit. For this purpose, the server keeps a transaction cache for each thread, as illustrated in Figure 4-4. Each statement executed for a transaction is placed in the transaction cache, and the contents of the transaction cache are then copied to the binary log and emptied when the transaction commits.
Statements that contain nontransactional changes require special attention. Recall from our previous discussion that nontransactional statements do not cause the current transaction to terminate, so the changes introduced by the execution of a nontransactional statement have to be recorded somewhere without closing the currently open transaction. The situation is further complicated by statements that simultaneously affect transactional and nontransactional tables. These statements are considered transactional but include changes that are not part of the transaction.
Statement-based replication cannot handle this correctly in all situations, so a best-effort approach has been taken. We’ll describe the measures taken by the server, followed by the issues you have to be aware of in order to avoid the replication problems that are left over.
How nontransactional statements are logged
When no transaction is open, nontransactional statements are written to the binary log at the end of execution of the statement and do not “transit” in the transaction cache before ending up in the binary log. If, however, a transaction is open, the rules for how to handle the statement are as follows:
Rule 1: If the statement is marked as transactional, it is written to the transaction cache.
Rule 2: If the statement is not marked as transactional and there are no statements in the transaction cache, the statement is written directly to the binary log.
Rule 3: If the statement is not marked as transactional, but there are statements in the transaction cache, the statement is written to the transaction cache.
The third rule might seem strange, but you can understand the
reasoning if you look at Example 4-15. Returning to our
consider the statements in Example 4-15, where a modification of a
transactional table comes before modification of a nontransactional
table in the transaction.
1 START TRANSACTION; 2 SET @pass = PASSWORD('xyzzy'); 3 INSERT INTO employee(name,email,password) VALUES ('mats','firstname.lastname@example.org', @pass); 4 INSERT INTO log(email, message) VALUES ('email@example.com', 'This employee was bad'); 5 COMMIT;
Following rule 3, the statement on line 4 is written to the transaction cache even though the table is nontransactional. If the statement were written directly to the binary log, it would end up before the statement in line 3 because the statement in line 3 would not end up in the binary log until a successful commit in line 5. In short, the slave’s log would end up containing the comment added by the DBA in line 4 before the actual change to the employee in line 3, which is clearly inconsistent with the master. Rule 3 avoids such situations. The left side of Figure 4-5 shows the undesired effects if rule 3 did not apply, whereas the right side shows what actually happens thanks to rule 3.
Rule 3 involves a trade-off. Because the nontransactional statement is cached while the transaction executes, there is a risk that two transactions will update a nontransactional table on the master in a different order from that in which they are written to the binary log.
This situation can arise when there is a dependency between the first transactional and the second nontransactional statement of the transaction, but this cannot generally be handled by the server because it would require parsing each statement completely, including code in all triggers invoked, and performing a dependency analysis. Although technically possible, this would add extra processing to all statements during an open transaction and would therefore affect performance, perhaps significantly. Because the problem can almost always be avoided by designing transactions properly and ensuring that there are no dependencies of this kind in the transaction, the overhead was not added to MySQL.
How to avoid replication problems with nontransactional statements
The best strategy for avoiding problems is not to use nontransactional tables. However, if they are required by the application, a strategy for avoiding the dependencies discussed in the previous section is to ensure that statements affecting nontransactional tables are written first in the transaction. In this case, the statements will be written directly to the binary log, because the transaction cache is empty (refer to rule 2 in the preceding section). The statements are known to have no dependencies.
If you need any values from these statements later in the transaction, you can assign them to temporary tables or variables. After that, the real contents of the transaction can be executed, referencing the temporary tables or variables.
Distributed Transaction Processing Using XA
MySQL version 5.0 lets you coordinate transactions involving different resources by using the X/Open Distributed Transaction Processing model, XA. Although currently not very widely used, XA offers attractive opportunities for coordinating all kinds of resources with transactions.
In version 5.0, the server uses XA internally to coordinate the binary log and the storage engines.
A set of commands allows the client to take advantage of XA synchronization as well. XA allows different statements entered by different users to be treated as a single transaction. On the other hand, it imposes some overhead, so some administrators turn it off globally.
Instructions for working with the XA protocol are beyond the scope of this book, but we will give a brief introduction to XA here before describing how it affects the binary log.
XA includes a transaction manager that coordinates a set of resource managers so that they commit a global transaction as an atomic unit. Each transaction is assigned a unique XID, which is used by the transaction manager and the resource managers. When used internally in the MySQL server, the transaction manager is usually the binary log and the resource managers are the storage engines. The process of committing an XA transaction is shown in Figure 4-8 and consists of two phases.
In phase 1, each storage engine is asked to prepare for a commit. When preparing, the storage engine writes any information it needs to commit correctly to safe storage and then returns an OK message. If any storage engine replies negatively—meaning that it cannot commit the transaction—the commit is aborted and all engines are instructed to roll back the transaction.
After all storage engines have reported that they have prepared
without error, and before phase 2 begins, the transaction cache is
written to the binary log. In contrast to normal transactions, which are
terminated with a normal
Query event with
COMMIT, an XA transaction is
terminated with an
Xid event containing
In phase 2, all the storage engines that were prepared in phase 1 are asked to commit the transaction. When committing, each storage engine will report that it has committed the transaction in stable storage. It is important to understand that the commit cannot fail: once phase 1 has passed, the storage engine has guaranteed that the transaction can be committed and therefore is not allowed to report failure in phase 2. A hardware failure can, of course, cause a crash, but since the storage engines have stored the information in durable storage, they will be able to recover properly when the server restarts. The restart procedure is discussed in the section The Binary Log and Crash Safety.
After phase 2, the transaction manager is given a chance to discard any shared resources, should it choose to. The binary log does not need to do any such cleanup actions, so it does not do anything special with regard to XA at this step.
In the event that a crash occurs while committing an XA
transaction, the recovery procedure in Figure 4-9 will take place when the server
is restarted. At startup, the server will open the last binary log and
event. If the
is set (described in Binlog File Rotation), it
indicates that the server crashed and XA recovery has to be
The server starts by walking through the binary log that was just
opened and finding the XIDs of all transactions in the binary log by
Xid events. Each storage
engine loaded into the server will then be asked to commit the
transactions in this list. For each XID in the list, the storage engine
will determine whether a transaction with that XID is prepared but not
committed, and commit it if that is the case. If the storage engine has
prepared a transaction with an XID that is not in
this list, the XID obviously did not make it to the binary log before
the server crashed, so the transaction should be rolled back.
Binary Log Group Commit
Writing data to disk takes time; quite a lot actually. The performance of disk is orders of magnitude slower than main memory, where disk access times is counted in milliseconds, main memory access is counted in nanoseconds. For that reason, operating systems have elaborate memory management systems for keeping part of the files in memory and not performing more writes to disk than necessary. Since database systems have to be safe from crashes, it is necessary to force the data to be written to disk when a transaction is committed.
To avoid the performance impact resulting if each transaction has to be written to disk, multiple independent transactions can be grouped together and written to disk as a single unit, which is called group commit. Because the disk write time is mainly a consequence of the time it takes to move the disk head to the correct position on the disk, and not the amount of data written, this improves performance significantly.
It is, however, not sufficient to commit the transaction data in the storage engine efficiently, the binary log also has to be written efficiently. For that reason, binary log group commit was added in MySQL 5.6. With this, multiple independent transactions can be committed as a group to the binary log, improving performance significantly.
In order for online backup tools like MySQL Enterprise Backup to work correctly, it is important that transactions are written to the binary log in the same order as they are prepared in the storage engine.
The implementation is built around a set of stages that each transaction has to pass through before being fully committed, you can see an illustration in Figure 4-10. Each stage is protected by a mutex so there can never be more than one thread in each stage.
Each stage handles one part of the commit procedure. The first stage flushes the transaction caches of the threads to the file pages, the second stage executes a sync to get the file pages to disk, and the last stage commits all transactions.
To move sessions between stages in an ordered manner, each stage has an associated queue where sessions can queue up for processing. Each stage queue is protected by a mutex that is held briefly while manipulating the queue. You can see the names of the mutexes for each stage in Table 4-1, in case you want to use the performance schema to monitor performance of the commit phase.
Stage queue mutex
Sessions normally run in separate threads, so any session thread that wants to commit a transaction enqueues the session in the flush stage queue. A session thread then proceeds through each stage in the following way:
If the session thread is enqueued to a non-empty stage queue, it is a follower and will go and wait for its transaction to be committed by some other session thread.
If the session thread enqueued to an empty stage queue, it is a (stage) leader and will bring all the sessions registered for the stage through the stage.
The leader empties all the sessions of the queue in a single step. The order of the sessions will be maintained, but after this step, new sessions can enqueue to the stage queue.
The stage processing is done as follows:
For the flush queue, the transaction of each session will be flushed to the binary log in the same order as they enqueued to the flush queue.
For the commit stage, the transactions are committed in the storage engines in the same order as they registered.
The sessions are enqueued to the queue for the next stage in the same order as they were registered for this stage.
Note that the queue might not be empty, which means that a session thread has “caught up” with a preceding leader thread. In this case, the session thread that was the leader and did the processing will become a follower for the next stage and let the leader already queued up handle the processing for the combined sets of sessions. Note that a leader can become a follower, but a follower can never become a leader.
As explained, new leader threads merge their session queues with
older threads that are in the middle of commit procedures. This lets the
system adapt dynamically to changing situations. Normally, the sync
stage is the most expensive stage, so many batches of sessions will pass
through the flush stage and “pile up” in the sync stage queue, after
which a single leader thread will bring all the sessions through that
stage. However, if battery-backed caches are used,
fsync calls are very cheap, so in that case,
sessions might not pile up at this stage.
Transactions have to be committed in order so that online backup methods (such as XtraBackup or MySQL Enterprise Backup) work correctly. For this reason, the commit stage commits the transactions in the same order as they are written to the binary log.
However, it is also possible to commit transactions in parallel in the commit stage. This means that the commits are done in an arbitrary order. This does not affect correctness in normal operations, but it means that you should not take an online backup while doing parallel commits.
Benchmarks did not show any measurable improvement in performance when committing transactions in parallel, so the default is to always commit in order. This is also the recommended setting, unless you have special needs (or just want to test it).
You can control whether transactions should be ordered using
binlog_order_commits. If this is
OFF, transactions are committed in parallel
and the last stage is skipped (the threads commit themselves in an
arbitrary order instead of waiting for the leader).
The flush stage includes an optimization when reading sessions
from the queue. Instead of entering the flush stage and taking the
entire queue for processing, the leader “skims” one session at a time
from the queue, flushes it, and then repeats the procedure. The idea
behind this optimization is that as long as there are sessions
enqueueing, there is no point in advancing to the queue stage, and since
flushing the transaction to the binary log takes time, performing it on
each session provides an opportunity for more sessions that need
committing to enqueue themselves. This optimization showed significant
performance improvements in throughput. But because it can affect
latency, you can control how long the leader thread will keep “skimming”
sessions from the flush queue using the
binlog_max_flush_queue_time, which takes the number of microseconds that the leader
should skim the queue. Once the timer expires, the entire queue will be
grabbed and the leader reverts to executing the procedure described in
the preceding list for the stage. This means that the time given is not
the maximum latency of a transaction: the leader will have to flush all
the transactions that are registered for the queue before moving to the
The primary goal of replication is to keep the master and the slave synchronized so they have the same data. As you saw earlier, replication offers a number of special features to ensure the results are as close as possible to being identical on master and slave: context events, session-specific IDs, etc.
Despite this, there are still some situations that statement-based replication can’t currently handle correctly:
If a statement updates two tables with autoincrement columns, it will not work correctly, because only a single last insert ID can be replicated, which will then be used for both tables on the slave, while on the master, the insert ID for each table will be used individually.
In these cases, it is better to replicate the actual data being inserted into the tables, which is what row-based replication does.
Instead of replicating the statement that performs the changes, row-based replication replicates each row being inserted, deleted, or updated separately, with the values that were used for the operation. Since the row that is sent to the slave is the same row that is sent to the storage engine, it contains the actual data being inserted into the table. Hence there are no UDFs to consider, no autoincrement counters to keep track of, and no partial execution of statements to take into consideration—just data, plain and simple.
Row-based replication opens up an entirely new set of scenarios that you just cannot accomplish with statement-based replication. However, you must also be aware of some differences in behavior.
When choosing between statement-based and row-based replication, consider the following:
Do you have statements that update a lot of rows, or do the statements usually only change or insert a few rows?
If the statement changes a lot of rows, statement-based replication will have more compact statements and may execute faster. But since the statement is executed on the slave as well, this is not always true. If the statement has a complex optimization and execution plan, it might be faster to use row-based replication, because the logic for finding rows is much faster.
If the statement changes or inserts only a few rows, row-based replication is potentially faster because there is no parsing involved and all processing goes directly to the storage engine.
Do you need to see which statements are executed? The events for handling row-based replication are hard to decode, to say the least. In statement-based replication, the statements are written into the binary log and hence can be read directly.
Statement-based replication has a simple replication model: just execute the same statement on the slave. This has existed for a long time and is familiar to many DBAs. Row-based replication, on the other hand, is less familiar to many DBAs and can therefore potentially be harder to fix when replication fails.
If data is different on master and slave, executing statements can yield different results on master and slave. Sometimes this is intentional—in this case, statement-based replication can and should be used—but sometimes this not intentional and can be prevented through row-based replication.
Enabling Row-based Replication
You can control the format to use when writing to the binary log using the
binlog-format. This option can take the values
ROW (see Options for Row-Based Replication
for more information), which comes both as a global variable and session
variable. This makes it possible for a session to temporarily switch
replication format (but you need
SUPER privileges to change this variable, even
the session version). However, to ensure that row-based replication is
used when starting the server, you need to stop the master and update
the configuration file. Example 4-16
shows the addition needed to enable row-based replication.
[mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp log-bin = master-bin log-bin-index = master-bin.index server-id = 1 binlog-format = ROW
Using Mixed Mode
Mixed-mode replication is recommended for MySQL version 5.1 and later, but the
default value for the
binlog-format option is
STATEMENT. This might seem odd, but that
decision was made to avoid problems for users who upgrade from versions
5.0 or earlier. Because those versions had no row-based replication and
users had to use statement-based replication, the MySQL developers did
not want servers to make a sudden switch. If the servers suddenly
started sending out row-based replication events when they were
upgraded, the deployment would likely be a mess. To reduce the number of
factors that an upgrading DBA has to consider, the default for this
However, if you use one of the template files distributed with
MySQL version 5.1, you will notice the
option has the value
MIXED, per the
The principles behind mixed-mode replication are simple: use statement-based replication normally and switch to row-based replication for unsafe statements. We have already examined the kinds of statements that can lead to problems and why. To summarize, mixed-mode currently switches to row-based replication if:
Binary Log Management
The events mentioned thus far represent some real change of data that occurred on the master. However, things can happen that do not represent any change of data on the master but can affect replication. For example, if the server is stopped, it can potentially affect replication. No events can, of course, be written to the binary log when the server is not running, but this means that if anything is changed in the datafile, it will not be represented in the binary log. A typical example of this is restoring a backup, or otherwise manipulating the datafiles. Such changes are not replicated simply because the server is not running. However, the fact that the server stopped is sometimes represented in the binary log using a binary log event precisely to be able to recognize that there can be a “gap” in the binary log where things could have happened.
Events are needed for other purposes as well. Because the binary logs consist of multiple files, it is necessary to split the groups at convenient places to form the sequence of binlog files. To handle this safely, special events (rotate events) are added to the binary log.
The Binary Log and Crash Safety
As you have seen, changes to the binary log do not correspond to changes to the master databases on a one-to-one basis. It is important to keep the databases and the binary log mutually consistent in case of a crash. In other words, there should be no changes committed to the storage engine that are not written to the binary log, and vice versa.
Nontransactional engines introduce problems right away. For example, it is not possible to guarantee consistency between the binary log and a MyISAM table because MyISAM is nontransactional and the storage engine will carry through any requested change long before any attempt to log the statement.
But for transactional storage engines, MySQL includes measures to make sure that a crash does not cause the binary log to lose too much information.
As we described in Logging Statements, events are written to the binary log before releasing the locks on the table, but after all the changes have been given to the storage engine. So if there is a crash before the storage engine releases the locks, the server has to ensure that any changes recorded to the binary log are actually in the table on the disk before allowing the statement (or transaction) to commit. This requires coordination with standard filesystem synchronization.
Because disk accesses are very expensive compared to memory accesses, operating systems are designed to cache parts of the file in a dedicated part of the main memory—usually called the page cache—and wait to write file data to disk until necessary. Writing to disk becomes necessary when another page must be loaded from disk and the page cache is full, but it can also be requested by an application by doing an explicit call to write the pages of a file to disk.
Recall from the earlier description of XA that when the first
phase is complete, all data has to be written to durable storage—that
is, to disk—for the protocol to handle crashes correctly. This means
that every time a transaction is committed, the page cache has to be
written to disk. This can be very expensive and, depending on the
application, not always necessary. To control how often the data is
written to disk, you can set the
sync-binlog option. This option takes
an integer specifying how often to write the binary log to disk. If the
option is set to 5, for instance, the binary log will be written to disk
on every fifth commit group of statements or transactions. The default
value is 0, which means that the binary log is not explicitly written to
disk by the server, but happens at the discretion of the operating
Note that since the introduction of binary log group commit in
MySQL version 5.6, there is a sync with each commit group and not with
each transaction or statement. This means that with
sync-binlog=1 several transactions will be written to
disk in a batch. You can read more about binary log group commit in
Binary Log Group Commit.
For storage engines that support XA, such as InnoDB, setting the
sync-binlog option to 1 means that you will not lose
any transactions under normal crashes. For engines that do not support
XA, you might lose at most one transaction.
If, however, every group is written to disk, it means that the
performance suffers, usually a lot. Disk accesses are notoriously slow
and caches are used for precisely the purpose of improving the
performance because they remove the need to always write data to disk.
If you are prepared to risk losing a few transactions or
statements—either because you can handle the work it takes to recover
this manually or because it is not important for the application—you can
sync-binlog to a higher value or leave it at the
Binlog File Rotation
MySQL starts a new file to hold binary log events at regular intervals. For practical and administrative reasons, it wouldn’t work to keep writing to a single file—operating systems have limits on file sizes. As mentioned earlier, the file to which the server is currently writing is called the active binlog file.
Switching to a new file is called binary log rotation or binlog file rotation, depending on the context.
Four main activities cause a rotation:
- The server stops
Each time the server starts, it begins a new binary log. We’ll discuss why shortly.
- The binlog file reaches a maximum size
- The binary log is explicitly flushed
FLUSH LOGScommand writes all logs to disk and creates a new file to continue writing the binary log. This can be useful when administering recovery images for PITR. Reading from an open binlog file can have unexpected results, so it is advisable to force an explicit flush before trying to use binlog files for recovery.
- An incident occurred on the server
In addition to stopping altogether, the server can encounter incidents that cause the binary log to be rotated. These incidents sometimes require special manual intervention from the administrator, because they can leave a “gap” in the replication stream. It is easier for the DBA to handle the incident if the server starts on a fresh binlog file after an incident.
Three items are of particular interest here:
Because a crash can occur while the server is writing to a binlog file, it is critical to indicate when a file was closed properly. Otherwise, a DBA could replay a corrupted file on the master or slave and cause more problems. To provide assurance about the file’s integrity, the
binlog-in-useflag is set when the file is created and cleared after the final event (
Rotate) has been written to the file. Thus, any program can see whether the binlog file was properly closed.
- Binlog file format version
Over the course of MySQL development, the format for the binary log has changed several times, and it will certainly change again. Developers increment the version number for the format when significant changes—notably changes to the common headers—render new files unreadable to previous versions of the server. (The current format, starting with MySQL version 5.0, is version 4.) The binlog file format version field lists its version number; if a different server cannot handle a file with that version, it simply refuses to read the file.
- Server version
This is a string denoting the version of the server that wrote the file. The server version used to run the examples in this chapter was “5.5.31-0ubuntu0.12.04.1,” for instance. As you can see, the string is guaranteed to include the MySQL server version, but it also contains additional information related to the specific build. In some situations, this information can help you or the developers figure out and resolve subtle bugs that can occur when replicating between different versions of the server.
To rotate the binary log safely even in the presence of crashes, the server uses a write-ahead strategy and records its intention in a temporary file called the purge index file (this name was chosen because the file is used while purging binlog files as well, as you will see). Its name is based on that of the index file, so for instance if the name of the index file is master-bin.index, the name of the purge index file is master-bin.~rec~. After creating the new binlog file and updating the index file to point to it, the server removes the purge index file.
In versions of MySQL earlier than 5.1.43, rotation or binlog file purging could leave orphaned files; that is, the files might exist in the filesystem without being mentioned in the index file. Because of this, old files might not be purged correctly, leaving them around and requiring manual cleaning of the files from the directory.
The orphaned files do not cause a problem for replication, but can be considered an annoyance. The procedure shown in this section ensures that no files are orphaned in the event of a crash.
In the event of a crash, if a purge index file is present on the server, the server can compare the purge index file and the index file when it restarts and see what was actually accomplished compared to what was intended.
Before MySQL version 5.6 it was possible that the binary log could be left partially written. This could occur if only a part of the cache was written to the binary log before the server crashed. With MySQL version 5.6, the binary log will be trimmed and the partially written transaction removed from the binary log. This is safe because the transaction is not committed in the storage engine until it was written completely to the binary log.
The term “incidents” refers to events that don’t change data on a server but must be written to the binary log because they have the potential to affect replication. Most incidents don’t require special intervention from the DBA—for instance, servers can stop and restart without changes to database files—but there will inevitably be some incidents that call for special action.
Currently, there are two incident events that you might discover in a binary log:
Indicates that the server was stopped through normal means. If the server crashed, no stop event will be written, even when the server is brought up again. This event is written in the old binlog file (restarting the server rotates to a new file) and contains only a common header; no other information is provided in the event.
When the binary log is replayed on the slave, it ignores any
Stopevents. Normally, the fact that the server stopped does not require special attention and replication can proceed as usual. If the server was switched to a new version while it was stopped, this will be indicated in the next binlog file, and the server reading the binlog file will then stop if it cannot handle the new version of the binlog format. In this sense, the
Stopevent does not represent a “gap” in the replication stream. However, the event is worth recording because someone might manually restore a backup or make other changes to files before restarting replication, and the DBA replaying the file could find this event in order to start or stop the replay at the right time.
An event type introduced in version 5.1 as a generic incident event. In contrast with the
Stopevent, this event contains an identifier to specify what kind of incident occurred. It is used to indicate that the server was forced to perform actions that almost guarantee that changes are missing from the binary log.
For example, incident events in version 5.1 are written if the database was reloaded or if a nontransactional event was too big to fit in the binlog file. MySQL Cluster generates this event when one of the nodes had to reload the database and could therefore be out of sync.
When the binary log is replayed on the slave, it stops with an error if it encounters an
Incidentevent. In the case of the MySQL Cluster reload event, it indicates a need to resynchronize the cluster and probably to search for events that are missing from the binary log.
Purging the Binlog File
Over time, the server will accumulate binlog files unless old ones are purged from the filesystem. The server can automatically purge old binary logs from the filesystem, or you can explicitly tell the server to purge the files.
To make the server automatically purge old binlog files, set
expire-logs-days option—which is
available as a server variable as well—to the number of days that you
want to keep binlog files. Remember that as with all server variables,
this setting is not preserved between restarts of the server. So if you
want the automatic purging to keep going across restarts, you have to
add the setting to the my.cnf file
for the server.
PURGE BINARY LOGS BEFORE
This form of the command will purge all files that are before the given date. If
datetimeis in the middle of a logfile (which it usually is), all files before the one holding
datetimewill be purged.
PURGE BINARY LOGS TO '
This form of the command will purge all files that precede the given file. In other words, all files before
filenamein the output from
SHOW MASTER LOGSwill be removed, leaving
filenameas the first binlog file.
Binlog files are purged when the server starts or when a binary
log rotation is done. If the server discovers files that require
purging, either because a file is older than
expire-logs-days or because a
PURGE BINARY LOGS command was executed, it
will start by writing the files that the server has decided are ripe for
purging to the purge index file (for example, master-bin.~rec~). After that, the files are
removed from the filesystem, and finally the purge index file is removed.
In the event of a crash, the server can continue removing files by comparing the contents of the purge index file and the index file and removing all files that were not removed because of a crash. As you saw earlier, the purge index file is used when rotating as well, so if a crash occurs before the index file can be properly updated, the new binlog file will be removed and then re-created when the rotate is repeated.
The mysqlbinlog Utility
One of the more useful tools available to an administrator is the client program mysqlbinlog. This is a small program that can investigate the contents of binlog files as well as relay logfiles (we will cover the relay logs in Chapter 8). In addition to reading binlog files locally, mysqlbinlog can also fetch binlog files remotely from other servers.
The mysqlbinlog is a very useful tool when investigating problems with replication, but you can also use it to implement PITR, as demonstrated in Chapter 3.
The mysqlbinlog tool normally outputs the contents of the binary log in a form that can be executed by sending them to a running server. When statement-based replication is employed, the statements executed are emitted as SQL statements. For row-based replication, which will be introduced in Chapter 8, mysqlbinlog generates some additional data necessary to handle row-based replication. This chapter focuses entirely on statement-based replication, so we will use the command with options to suppress output needed to handle row-based replication.
Some options to mysqlbinlog will be explained in this section, but for a complete list, consult the online MySQL Reference Manual.
Let’s start with a simple example where we create a binlog file and then look at it using mysqlbinlog. We will start up a client connected to the master and execute the following commands to see how they end up in the binary log:
RESET MASTER;Query OK, 0 rows affected (0.01 sec) mysqld1>
CREATE TABLE employee (->
id INT AUTO_INCREMENT,->
name CHAR(64) NOT NULL,->
PRIMARY KEY (id)->
);Query OK, 0 rows affected (0.00 sec) mysqld1>
SET @password = PASSWORD('xyzzy');Query OK, 0 rows affected (0.00 sec) mysqld1>
INSERT INTO employee(name,email,password)->
VALUES ('mats','firstname.lastname@example.org',@password);Query OK, 1 row affected (0.01 sec) mysqld1>
SHOW BINARY LOGS;+--------------------+-----------+ | Log_name | File_size | +--------------------+-----------+ | mysqld1-bin.000038 | 670 | +--------------------+-----------+ 1 row in set (0.00 sec)
Let’s now use mysqlbinlog to dump the contents of the binlog file master-bin.000038, which is where all the commands ended up. The output shown in Example 4-17 has been edited slightly to fit the page.
sudo mysqlbinlog \>
/var/lib/mysql1/mysqld1-bin.0000381 /*!40019 SET @@session.max_insert_delayed_threads=0*/; 2 /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; 3 DELIMITER /*!*/; 4 ROLLBACK/*!*/; 5 use test/*!*/; 6 SET TIMESTAMP=1264227693/*!*/; 7 SET @@session.pseudo_thread_id=999999999/*!*/; 8 SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1, @@session.autocommit=1/*!*/; 9 SET @@session.sql_mode=0/*!*/; 10 SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; 11 /*!\C utf8 *//*!*/; 12 SET @@session.character_set_client=8,@@session.collation_connection=8, @@session.collation_server=8/*!*/; 13 SET @@session.lc_time_names=0/*!*/; 14 SET @@session.collation_database=DEFAULT/*!*/; 15 CREATE TABLE employee ( 16 id INT AUTO_INCREMENT, 17 name CHAR(64) NOT NULL, 18 email CHAR(64), 19 password CHAR(64), 20 PRIMARY KEY (id) 21 ) ENGINE=InnoDB 22 /*!*/; 23 SET TIMESTAMP=1264227693/*!*/; 24 BEGIN 25 /*!*/; 26 SET INSERT_ID=1/*!*/; 27 SET @`password`:=_utf8 0x2A31353141463... COLLATE `utf8_general_ci`/*!*/; 28 SET TIMESTAMP=1264227693/*!*/; 29 INSERT INTO employee(name,email,password) 30 VALUES ('mats','email@example.com',@password) 31 /*!*/; 32 COMMIT/*!*/; 33 DELIMITER ; 34 # End of log file 35 ROLLBACK /* added by mysqlbinlog */; 36 /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
To get this output, we use three options:
With this option, mysqlbinlog prints only information about the SQL statements issued, and leaves out comments with information about the events in the binary log. This option is useful when mysqlbinlog is used only to play back the events to a server. If you want to investigate the binary log for problems, you will need these comments and should not use this option.
If the binlog file is not closed properly, either because the binlog file is still being written to or because the server crashed, mysqlbinlog will print a warning that this binlog file was not closed properly. This option prevents the printing of that warning.
This prevents mysqlbinlog from printing base64-encoded events. If mysqlbinlog has to print base64-encoded events, it will also print the
Format_descriptionevent of the binary log to show the encoding used. For statement-based replication, this is not necessary, so this option is used to suppress that event.
In Example 4-17, lines 1–4 contain the preamble printed in every output. Line 3 sets a delimiter that is unlikely to occur elsewhere in the file. The delimiter is also designed to appear as a comment in processing languages that do not recognize the setting of the delimiter.
The rollback on line 4 is issued to ensure the output is not accidentally put inside a transaction because a transaction was started on the client before the output was fed into the client.
We can skip momentarily to the end of the output—lines 33–35—to see the counterpart to lines 1–4. They restore the values set in the preamble and roll back any open transaction. This is necessary in case the binlog file was truncated in the middle of a transaction, to prevent any SQL code following this output from being included in a transaction.
USE statement on line 5 is printed whenever the database is changed. Even
though the binary log specifies the current database before each SQL
statement, mysqlbinlog shows only the
changes to the current database. When a
USE statement appears, it is the first line of
a new event.
The first line that is guaranteed to be in the output for each
SET TIMESTAMP, as
shown on lines 6 and 23. This statement gives the timestamp when the event started executing in seconds
since the epoch.
Lines 7–14 contain general settings, but like
USE on line 5, they are printed only for the
first event and whenever their values change.
on lines 29–30 is inserting into a table with an
autoincrement column using a user-defined variable, the
variable on line 26 and the user-defined variable on line 27 are set
before the statement. This is the result of the
User_var events in the binary log.
If you omit the
short-form option, each event in
the output will be preceded by some comments about the event that generated the lines. You can
see these comments, which start with hash marks (#) in Example 4-18.
sudo mysqlbinlog \>
/var/lib/mysql1/mysqld1-bin.000038. . . 1 # at 386 2 #100123 7:21:33 server id 1 end_log_pos 414 Intvar 3 SET INSERT_ID=1/*!*/; 4 # at 414 5 #100123 7:21:33 server id 1 end_log_pos 496 User_var 6 SET @`password`:=_utf8 0x2A313531...838 COLLATE `utf8_general_ci`/*!*/; 7 # at 496 8 #100123 7:21:33 server id 1 end_log_pos 643 Query thread_id=6 exec_time=0 error_code=0 9 SET TIMESTAMP=1264227693/*!*/; 10 INSERT INTO employee(name,email,password) 11 VALUES ('mats','firstname.lastname@example.org',@password) 12 /*!*/; 13 # at 643 14 #100123 7:21:33 server id 1 end_log_pos 670 Xid = 218 15 COMMIT/*!*/; 16 DELIMITER ; 17 # End of log file 18 ROLLBACK /* added by mysqlbinlog */; 19 /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
The first line of the comment gives the byte position of the
event, and the second line contains other information about the event.
Consider, for example, the
# at 496 #100123 7:21:33 server id 1 end_log_pos 643 Query thread_id=6 exec_time=0 error_code=0
The various parts of the comments have the following meanings:
The byte position where the event starts (i.e., the first byte of the event).
The timestamp of the event as a
datetime(date plus time). This is the time when the query started executing or when the events were written to the binary log.
The server ID of the server that generated the event. This server ID is used to set the
pseudo_thread_idsession variable, and a line setting this variable is printed if the event is thread-specific and the server ID is different from the previously printed ID.
The byte position of the event that follows this event. By taking the difference between this value and the position where the event starts, you can get the length of the event.
The type of event. In Example 4-18, you can see several different types of events, such as
The fields after these are event-specific, and hence different for
each event. For the
Query event, we
can see two additional fields:
The ID of the thread that executed the event. This is used to handle thread-specific queries, such as queries that access temporary tables.
The execution time of the query in seconds.
The files are printed in the order you request them, and there is
no checking that the
ending each file refers to the next file in sequence. The responsibility
for ensuring that these binlog files make up part of a real binary log
lies on the user.
Thanks to the way the binlog files binlog files are given, they will be processed in
are named, submitting multiple files to mysqlbinlog—such as by using
* as a file-globbing wildcard—is usually not a
problem. However, let’s look at what happens when the binlog file
counter, which is used as an extension to the filename, goes from 999999
ls mysqld1-bin.[0-9]*mysqld1-bin.000007 mysqld1-bin.000011 mysqld1-bin.000039 mysqld1-bin.000008 mysqld1-bin.000035 mysqld1-bin.1000000 mysqld1-bin.000009 mysqld1-bin.000037 mysqld1-bin.999998 mysqld1-bin.000010 mysqld1-bin.000038 mysqld1-bin.999999
As you can see, the last binlog file to be created is listed before the two binlog files that are earlier in binary log order. So it is worth checking the names of the files before you use wildcards.
Since your binlog files are usually pretty large, you won’t want to print the entire contents of the binlog files and browse them. Instead, there are a few options you can use to limit the output so that only a range of the events is printed.
If an event does not start at the position given, mysqlbinlog will still try to interpret the bytes starting at that position as an event, which usually leads to garbage output.
The byte position of the last event to print. If no event ends at that position, the last event printed will be the event with a position that precedes
bytepos. If multiple binlog files are given, the position will be the position of the last file in the sequence.
Prints only events that have a timestamp at or after
datetime. This will work correctly when multiple files are given—if all events of a file are before the
datetime, all events will be skipped—but there is no checking that the events are printed in order according to their timestamps.
Prints only events that have a timestamp before
datetime. This is an exclusive range, meaning that if an event is marked
2010-01-24 07:58:32and that exact datetime is given, the event will not be printed.
Note that since the timestamp of the event uses the start time of the statement but events are ordered in the binary log based on the commit time, it is possible to have events with a timestamp that comes before the timestamp of the preceding event. Since mysqlbinlog stops at the first event with a timestamp outside the range, there might be events that aren’t displayed because they have timestamps before
Reading remote files
In addition to reading files on a local filesystem, the mysqlbinlog utility can read binlog files
from a remote server. It does this by using the same mechanism that
the slaves use to connect to a master and ask for events. This can be
practical in some cases, since it does not require a shell account on
the machine to read the binlog files, just a user on the server
To handle remote reading of binlog files, include the
along with a host and user for connecting to the server, and
optionally a port (if different from the default) and a
When reading from a remote server, give just the name of the binlog file, not the full path.
So to read the
from Example 4-18 remotely, the
command would look something like the following (the server prompts
for a password, but it is not output when you enter it):
mysqld1-bin.000038Enter password: /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 294 #130608 22:09:19 server id 1 end_log_pos 0 Start: binlog v 4, server v 5.5.31 -0ubuntu0.12.04.1-log created 130608 22:09:19 # at 294 #130608 22:13:08 server id 1 end_log_pos 362 Query thread_id=53 exec_time=0 error_code=0 SET TIMESTAMP=1370722388/*!*/; SET @@session.pseudo_thread_id=53/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0... SET @@session.sql_mode=0/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset... /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33... SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; BEGIN /*!*/; # at 362 #130608 22:13:08 server id 1 end_log_pos 390 Intvar SET INSERT_ID=1/*!*/; # at 390 #130608 22:13:08 server id 1 end_log_pos 472 User_var SET @`password`:=_utf8 0x2A31353141463642384333413641413039434643434244333... # at 472 #130608 22:13:08 server id 1 end_log_pos 627 Query thread_id=53 exec_time=0 use `test`/*!*/; SET TIMESTAMP=1370722388/*!*/; INSERT INTO employee(name, email, password) VALUES ('mats', 'email@example.com', @password) /*!*/; # at 627 #130608 22:13:08 server id 1 end_log_pos 654 Xid = 175 COMMIT/*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
Reading raw binary logfiles
The mysqlbinlog utility is
very useful for investigating binary logs, but it can also be
used for taking backups of binlog files. This is very useful when you
do not have access to the machine using normal shell but have
REPLICATION SLAVE privileges. In this case, you can
read the binary logfiles from the server, but they should not be
parsed and rather just saved to files.
The normal way to use mysqlbinlog for taking a backup of binlog files remotely is:
mysqlbinlog --raw --read-from-remote-server \ --host=master.example.com --user=repl_user \ master-bin.000012 master-bin.000013 ...
In this example, the binlog files master-bin.000012 and master-bin.000013 will be read and saved in
the current directory. Note that you have to use
--read-from-remote-server together with
--read-from-remote-server is pointless since that
would be the same as using a plain file copy. The most interesting
options to control the behavior of mysqlbinlog are described below. You can
find a detailed description of how to use mysqlbinlog for backups at the MySQL Reference
This option gives the prefix to use when constructing the files to write to. The prefix can be a directory name (with trailing slash), or any other prefix. It defaults to the empty string, so if this option is not used, files will be written using the same name as they have on the master.
Normally, only the files given on the command line are transferred, but if this option is provided, only the starting binary log file has to be given. After that, mysqlbinlog will transfer all files after the first one.
Do not stop reading after reaching end of file of the last log: keep waiting for more input. This option is useful when taking a backup that is to be used for point-in-time recovery. See Backup and MySQL Replication for a detailed treatment of backing up for point-in-time recovery.
Sometimes, the standard information printed by mysqlbinlog is not sufficient for spotting a
problem, so it is necessary to go into the details of the event and
investigate its content. To handle such situations, you can pass the
hexdump option to tell mysqlbinlog to
write the actual bytes of the events.
Before going into the details of the events, here are some general rules about the format of the data in the binary log:
- Integer data
Integer fields in the binary log are printed in little-endian order, so you have to read integer fields backward. This means that, for example, the 32-bit block 03 01 00 00 represents the hexadecimal number 103.
- String data
This section will cover the most common events, but an exhaustive
reference concerning the format of all the events is beyond the scope of
this book. Check the Binary Log
section in the MySQL Internals Manual for an exhaustive list of
all the events available and their fields. The most common of all the
events is the
event, so let’s concentrate on it first. Example 4-19 shows the output for such
sudo mysqlbinlog \>
/var/lib/mysql1/mysqld1-bin.000038. . . 1 # at 496 2 #100123 7:21:33 server id 1 end_log_pos 643 3 # Position Timestamp Type Master ID Size Master Pos Flags 4 # 1f0 6d 95 5a 4b 02 01 00 00 00 93 00 00 00 83 02 00 00 10 00 5 # 203 06 00 00 00 00 00 00 00 04 00 00 1a 00 00 00 40 |................| 6 # 213 00 00 01 00 00 00 00 00 00 00 00 06 03 73 74 64 |.............std| 7 # 223 04 08 00 08 00 08 00 74 65 73 74 00 49 4e 53 45 |.......test.INSE| 8 # 233 52 54 20 49 4e 54 4f 20 75 73 65 72 28 6e 61 6d |RT.INTO.employee| 9 # 243 65 2c 65 6d 61 69 6c 2c 70 61 73 73 77 6f 72 64 |.name.email.pass| 10 # 253 29 0a 20 20 56 41 4c 55 45 53 20 28 27 6d 61 74 |word....VALUES..| 11 # 263 73 27 2c 27 6d 61 74 73 40 65 78 61 6d 70 6c 65 |.mats...mats.exa| 12 # 273 2e 63 6f 6d 27 2c 40 70 61 73 73 77 6f 72 64 29 |mple.com...passw| 13 # 283 6f 72 64 29 |ord.| 14 # Query thread_id=6 exec_time=0 error_code=0 SET TIMESTAMP=1264227693/*!*/; INSERT INTO employee(name,email,password) VALUES ('mats','firstname.lastname@example.org',@password)
The first two lines and line 14 are comments listing basic
information that we discussed earlier. Notice that when you use the
hexdump option, the general information and the
event-specific information are split into two lines, whereas they are
merged in the normal output.
Lines 3 and 4 list the common header:
The timestamp of the event as an integer, stored in little-endian format.
A single byte representing the type of the event. Some event types are given in the MySQL Internals Manual but to get the values for your specific server you need to look in the source code (in the file sql/log_event.h).
The server ID of the server that wrote the event, written as an integer. For the event shown in Example 4-19, the server ID is 1.
The size of the event in bytes, written as an integer.
The same as
end_log_pos(i.e., the start of the event following this event).
This field has 16 bits reserved for general flags concerning the event. The field is mostly unused, but it stores the
binlog-in-useflag. As you can see in Example 4-19, the
binlog-in-useflag is set, meaning that the binary log is not closed properly (in this case, because we didn’t flush the logs before calling mysqlbinlog).
After the common header come the post header and body for the
event. As already mentioned, an exhaustive coverage of all the events is
beyond the scope of this book, but we will cover the most important and
commonly used events: the
Query event post header and body
Query event is by
far the most used and also the most complicated event
issued by the server. Part of the reason is that it has to carry a lot
of information about the context of the statement when it was
executed. As already demonstrated, integer variables, user variables,
and random seeds are covered using specific events, but it is also
necessary to provide other information, which is part of this
The post header for the
event consists of five fields. Recall that these fields are of fixed
size and that the length of the post header is given in the
Format_description event for the binlog
file, meaning that later MySQL versions may add additional fields if
the need should arise.
- Thread ID
A four-byte unsigned integer representing the thread ID that executed the statement. Even though the thread ID is not always necessary to execute the statement correctly, it is always written into the event.
- Execution time
The number of seconds from the start of execution of the query to when it was written to the binary log, expressed as a four-byte unsigned integer.
- Database name length
The length of the database name, stored as an unsigned one-byte integer. The database name is stored in the event body, but the length is given here.
- Error code
The error code resulting from execution of the statement, stored as a two-byte unsigned integer. This field is included because, in some cases, statements have to be logged to the binary log even when they fail.
- Status variables length
The length of the block in the event body storing the status variables, stored as a two-byte unsigned integer. This status block is sometimes used with a
Queryevent to store various status variables, such as
The event body consists of the following fields, which are all of variable length.
- Status variables
A sequence of status variables. Each status variable is represented by a single integer followed by the value of the status variable. The interpretation and length of each status variable value depends on which status variable it concerns. Status variables are not always present; they are added only when necessary. Some examples of status variables follow:
- Current database
- Statement text
The statement that was executed. The length of the statement can be computed from the information in the common header and the post header. This statement is normally identical to the original statement written, but in some cases, the statement is rewritten before it is stored in the binary log. For instance, as you saw earlier in this chapter, triggers and stored procedures are stored with
Format description event post header and body
records important information about the binlog file
format, the event format, and the server. Because it has to remain
robust between versions—it
should still be possible to interpret it even if the binlog format
changes—there are some
restrictions on which changes are allowed.
One of the more important restrictions is that the common header
of both the
event and the
Rotate event is
fixed at 19 bytes. This means that it is not possible to
extend the event with new fields in the common header.
The post header and event body for the
Format_description event contain the
- Binlog file version
The version of the binlog file format used by this file. For MySQL versions 5.0 and later, this is 4.
- Server version string
A 50-byte string storing server version information. This is usually the three-part version number followed by information about the options used for the build, “5.5.31-0ubuntu0.12.04.1,” for instance.
- Creation time
A four-byte integer holding the creation time—the number of seconds since the epoch—of the first binlog file written by the server since startup. For later binlog files written by the server, this field will be zero.
This scheme allows a slave to determine that the server was restarted and that the slave should reset state and temporary data—for example, close any open transactions and drop any temporary tables it has created.
- Common header length
The length of the common header for all events in the binlog file except the
Rotateevents. As described earlier, the length of the common header for the
Rotateevents is fixed at 19 bytes.
- Post-header lengths
This is the only variable-length field of the
Format_descriptionlog event. It holds an array containing the size of the post header for each event in the binlog file as a one-byte integer. The value 255 is reserved as the length for the field, so the maximum length of a post header is 254 bytes.
Binary Log Options and Variables
Several options control such properties as the name of the binlog files and the index file. Most of these options can be manipulated as server variables as well. Some have already been mentioned earlier in the chapter, but here you will find more details on each:
By default this option is 0, meaning that binlog files are never removed.
The binary log is turned on by adding the
log-binoption in the my.cnf file, as explained in Chapter 3. In addition to turning on the binary log, this option gives a base name for the binlog files; that is, the portion of the filename before the dot. If an extension is provided, it is removed when forming the base name of the binlog files.
If the option is specified without a
basename, the base name defaults to
hostis the base name—that is, the filename without directory or extension—of the file given by the
pid-fileoption, which is usually the hostname as given by gethostname(2). For example, if
pid-fileis /usr/run/mysql/master.pid, the default name of the binlog files will be master-bin.000001, master-bin.000002, etc.
Since the default value for the
pid-fileoption includes the hostname, it is strongly recommended that you give a value to the
log-binoption. Otherwise the binlog files will change names when the hostname changes (unless
pid-fileis given an explicit value).
The default is the same as the base name used for
log-bin. For example, if the base name used to create binlog files is
master-bin, the index file will be named master-bin.index.
Similar to the situation for the
log-binoption, the hostname will be used for constructing the index filename, meaning that if the hostname changes, replication will break. For this reason, it is strongly recommended that you provide a value for this option.
When creating stored functions, it is possible to create specially crafted functions that allow arbitrary data to be read and manipulated on the slave. For this reason, creating stored functions requires the
SUPERprivilege. However, since stored functions are very useful in many circumstances, it might be that the DBA trusts anyone with
CREATE ROUTINEprivileges not to write malicious stored functions. For this reason, it is possible to disable the
SUPERprivilege requirement for creating stored functions (but
CREATE ROUTINEis still required).
The size of the in-memory part of the transaction cache in bytes. The transaction cache is backed by disk, so whenever the size of the transaction cache exceeds this value, the remaining data will go to disk.
This can potentially create a performance problem, so increasing the value of this option can improve performance if you use many large transactions.
Note that just allocating a very large buffer might not be a good idea, because that means that other parts of the server get less memory, which might cause performance degradation.
Use this option to restrict the size of each transaction in the binary log. Since large transactions can potentially block the binary log for a long time, they will cause other threads to convoy on the binary log and can therefore create a significant performance problem. If the size of a transaction exceeds
bytes, the statement will be aborted with an error.
Notice that if the transaction or statement exceeds
max-binlog-size, the binary log will be rotated, but the transaction will be written to the new file in its entirety, exceeding the specified maximum. This is because transactions are never split between binlog files.
Specifies how often to write the binary log to disk using fdatasync(2). The value given is the number of transaction commits for each real call to fdatasync(2). For instance, if a value of 1 is given, fdatasync(2) will be called for each transaction commit, and if a value of 10 is given, fdatasync(2) will be called after each 10 transaction commits.
A value of zero means that there will be no calls to fdatasync(2) at all and that the server trusts the operating system to write the binary log to disk as part of the normal file handling.
Prevents any client threads—except the slave thread and users with
SUPERprivileges—from updating any data on the server (this does not include temporary tables, which can still be updated). This is useful on slave servers to allow replication to proceed without data being corrupted by clients that connect to the slave.
Options for Row-Based Replication
This will use the traditional statement-based replication for all statements.
This will use the shiny new row-based replication for all statements that insert or change data (data manipulation language, or DML, statements). However, statement-based replication must still be used for statements that create tables or otherwise alter the schema (data definition language, or DDL, statements).
This is intended to be a safe version of statement-based replication and is the recommended mode to use with MySQL version 5.1 and later. In mixed-mode replication, the server writes the statements to the binary log as statements, but switches to row-based replication if a statement is considered unsafe through one of the criteria we have discussed in this chapter.
The variable also exists as a global server variable and as a session variable. When starting a new session, the global value is copied to the session variable and then the session variable is used to decide how to write statements to the binary log.
Use this option to specify when to start a new event for holding the rows. Because the events are read fully into memory when being processed, this option is a rough way of controlling the size of row-holding events so that not too much memory is used when processing the rows.
binlog-rows-query-log-events(new in MySQL 5.6)
Note that, because unrecognized events cause the slave to stop,
any slave server before MySQL 5.6.2 will not recognize the event and
hence stop replicating from the master. This means that if you intend to
use an informational event, you need to upgrade all slaves to MySQL
5.6.2 (or later) before enabling
Starting with MySQL 5.6.2, informational events can be added for other purposes as well, but they will not cause the slave to stop. They are intended to allow information to be added to the binary log for those readers (including slaves) that can have use for it, but they should not in any way change the semantics of execution and can therefore be safely ignored by slaves that do not recognize them.
Clearly, there is much to the binary log—including its use, composition, and techniques. We presented these concepts and more in this chapter, including how to control the binary log behavior. The material in this chapter builds a foundation for a greater understanding of the mechanics of the binary log and its importance in logging changes to data.