Preface

In the summer of 2003, somebody on the MySQL mailing list proposed a book about MySQL internals. As I read the email, I realized that I had the background to write such a book, but I had just finished writing my first book and was not looking forward to writing another. I tried to talk myself out of the responsibility, saying to myself nobody would ever publish a book so technical and specialized. There simply would not be enough of an audience for it.

Then I thought of Understanding the Linux Kernel and Linux Device Drivers by O’Reilly. That took away my excuse. I realized the door was open and I was standing in the doorway, but my inertia was keeping something good from happening. I thought about a passage in the Book of Mormon that says “a natural man is an enemy to God,” and the principle behind it. If you drift along, seeking only the pleasure of the moment and staying safely within your natural comfort zone, you do not accomplish much. Good things happen when you push yourself outside of your comfort zone, doing what is difficult but what you know deep inside is the right thing to do. I wrote an email with a proposal to O’Reilly.

Interestingly enough, my editor happened to be Andy Oram, who also participated in the publication of Understanding the Linux Kernel and Linux Device Drivers. He and I worked together on this book, and I appreciate his help very much. I felt that his strengths very well compensated for my weaknesses.

The book presented a number of challenges. Writing about the internals of an application means approaching it as a developer rather than just a user or an administrator. It requires a deeper level of understanding. Although I had worked on the MySQL source code extensively, I found myself doing a lot of research to figure out the gory details of algorithms, the purposes of functions and classes, the reasons for certain decisions, and other matters relevant to this book. In addition, as I was writing the book, MySQL developers were writing new code. It was not easy to keep up. And while the book was being written, I had to do other work to feed my growing family. Fortunately, a good portion of that work involved projects that dealt with MySQL internals, allowing me to stay on top of the game.

Nevertheless, the challenges were worth it. Growth comes through challenges, and I feel it did for me in this process. Now that I have finished the book, I have a better view of the design of MySQL as a whole, and a better knowledge of its dark and not so dark parts. It is my hope that the reader will experience a similar growth.

How This Book Is Organized

Chapter 1, MySQL History and Architecture

Introduces the major modules in the source code and their purpose.

Chapter 2, Nuts and Bolts of Working with the MySQL Source Code

Tells you how to download the source code and build a server from scratch.

Chapter 3, Core Classes, Structures, Variables, and APIs

Lists the basic data structures, functions, and macros you need for later reference.

Chapter 4, Client/Server Communication

Lays out the formats of the data sent between client and server, and the main functions that perform the communication.

Chapter 5, Configuration Variables

Discusses how MySQL handles configuration in general, as well as the effects of many particular configuration variables, and shows you a framework for adding a new configuration variable.

Chapter 6, Thread-Based Request Handling

Explains MySQL’s reasons for using threads and the main variables, such as locks, related to threads.

Chapter 7, The Storage Engine Interface

Describes the relation of individual storage engines (formerly known as table types) to the MySQL core, and shows you a framework for adding a new storage engine.

Chapter 8, Concurrent Access and Locking

Explains the different types of locks available in MySQL, and how each storage engine uses locks.

Chapter 9, Parser and Optimizer

Explains the major activities that go into optimizing queries.

Chapter 10, Storage Engines

Briefly describes the most important MySQL storage engines and some of the tree structures and other data structures they employ.

Chapter 11, Transactions

Lists the main issues required to support transactions, and uses InnoDB to illustrate the typical architecture used to provide that support.

Chapter 12, Replication

Gives on overview of replication with an emphasis on issues of implementation.

Who This Book Is For

This book can be useful for a number of readers: a developer trying to extend MySQL in some way; a DBA or database application programmer interested in how exactly MySQL runs his queries; a computer science student learning about database kernel development; a developer looking for ideas while working on a product that requires extensive database functionality that he must implement himself; a closed-source database developer wondering how in the world MySQL runs its queries so fast; a random, curious computer geek who has used MySQL some and wonders what is inside; and, of course, anybody who wants to look smart by having a book on MySQL internals displayed on his shelf.

Although MySQL source is open in the sense of being publicly available, it is in essence closed to you if you do not understand it. It may be intimidating to look at several hundred thousand lines of code written by gifted programmers that elegantly and efficiently solves difficult problems one line at a time. To understand the code, you will need a measure of the inspiration and perspiration of those who created it. Hopefully, this book can provide enough guidance to remove those barriers and to open the source of MySQL for you.

I do not believe it is possible to understand and appreciate MySQL strictly through a conceptual discussion. On a high conceptual level MySQL is very simple. It does not implement many revolutionary ideas. It sticks to the basics. Why is it so popular then? Why do we know enough about it for O’Reilly to be willing to publish a book on its internals?

The reason, in my opinion, is that what makes a good database is not so much the concepts behind it, but how well they are implemented. It is important to be conceptually sound on a basic level, but a good portion of the genius is in implementing those concepts in a way that provides a reasonable combination of good performance and the ease of maintenance. In other words, the devil is in the details, and MySQL developers have done a great job of taking that devil by the horns and twisting his head off.

Thus, in order to appreciate the inner workings of MySQL, you need to get close to the places where that devil is being subdued. Somewhere in the dark depths of the optimizer or inside the B-tree, there is music to be heard as you study the code. It will take some work to hear that music, but once you do, you can feel its beauty. And to hear the music you must not be afraid to compile the code, add a few debugging messages to help you understand the flow, and perhaps even change a few things to appreciate what will make the server crash (and how) if you fail to handle something that turns out to be important after all.

The first chapter provides a brief introduction of how different components of MySQL work together. Immediately afterward you will find a chapter about downloading and building MySQL from the source. You will have a much more meaningful experience studying MySQL internals if you follow the steps in it to get set up with a working, compilable copy of the code that you can change and test at your pleasure.

When approaching a new code base, I find it very useful to look at class/structure definitions and API call prototypes. I have a confession to make: I first look at the code, then read the comments, and I never look at block diagrams unless somebody asks me to. Chapter 3 is for the developers whose heads are wired like mine; it talks about the core server classes, structures, and API.

In Chapter 4 I talk about the communication protocol between the client and the server. Afterward, I hope you will say: “I am thankful for the MySQL API, and I even have a clue of how to fix it up if I had to!”

Chapter 5 discusses server configuration variables. Configuration variables are the controls of the server. Every one of them tells you about some special server capability or perhaps a problem some DBA had to solve at some point. It would not be too much of an exaggeration to say that if you understand the variables, you understand the server. Toward the end you will find a tutorial on how to add your own configuration variables.

Every server has to deal with the issue of how to handle multiple clients concurrently. MySQL does it using threads. Understanding threads and how they are used in MySQL is critical to being effective in working with MySQL source. Thus, Chapter 6 discusses thread-based request handling.

One of the distinct features of the MySQL architecture is its ability to integrate third-party storage engines. Chapter 7 focuses on the storage engine interface and provides a functional example of a simple storage engine.

Although at the moment MySQL supports a number of page and row-level locking storage engines, the core architecture has a strong MyISAM heritage. Part of that heritage is the mechanism to acquire a table lock. The table lock awareness, even when it is in essence a token lock, is important for an aspiring MySQL developer. Thus, Chapter 8 focuses on the table lock manager.

Chapter 9 focuses on the parser and optimizer. This is the chapter I would recommend to a DBA trying to improve the performance of MySQL. The key to optimizing MySQL queries and tables is to learn to think like the optimizer. This chapter also provides an overview of the source code for the brave developers preparing to immerse themselves into the optimizer’s dark depths.

Chapter 10 is a cursory overview of MySQL storage engines. It may be helpful to a developer trying to create or integrate her own. A curious reader looking for what is out there may also find it of interest.

Chapter 11 is mostly for developers working on integrating a transactional storage engine into MySQL, while Chapter 12 focuses on the internals of replication.

By no means is this book a comprehensive guide to MySQL internals. The subject is so deep that I do not believe it is humanly possible to scratch the surface even if you had 10,000 pages and the time to create them. To make matters more complicated, MySQL developers are adding new code daily. Fortunately, most of the core code tends to remain intact, so the book has a shot at not becoming obsolete before it is published. Nevertheless, do not be surprised when you look at the current MySQL code and find that some things are not quite like what you see in the book. You are likely to see new classes and calls in the API. On occasion, you may find that an old API call has a new argument. But hopefully the book can always serve as a guide to teach you enough basics about the code to bring you to a level of proficiency that will enable you to accomplish your goals.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates filenames, directories, and file extensions, new terms, URLs, commands and command-line options, usernames, hostnames, email addresses, and emphasized text.

Constant width

Indicates parts of code (such as variables, class names, methods, and macros), elements of SQL statements, contents of files, and output from commands.

Constant width italic

Indicates text that should be replaced with user-supplied values.

Constant width bold

Indicates user input in examples.

Tip

This icon signifies a tip, suggestion, or general note.

Warning

This icon indicates a warning or caution.

Using Code Examples

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. The longer examples can be downloaded from the book’s web site at http://www.oreilly.com/catalog/9780596009571. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: Understanding MySQL Internals by Sasha Pachev. Copyright 2007 O’Reilly Media, Inc., 978-0-596-00957-1.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

Comments and Questions

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at:

http://www.oreilly.com/catalog/9780596009571

To comment or ask technical questions about this book, send email to:

For more information about our books, conferences, Resource Centers, and the O’Reilly Network, see our web site at:

http://www.oreilly.com

Safari® Enabled

When you see a Safari® Enabled icon on the cover of your favorite technology book, that means the book is available online through the O’Reilly Network Safari Bookshelf.

Safari offers a solution that’s better than e-books. It’s a virtual library that lets you easily search thousands of top tech books, cut and paste code samples, download chapters, and find quick answers when you need the most accurate, current information. Try it for free at http://safari.oreilly.com.

Acknowledgments

I would like to express special thanks to Andy Oram for his continual guidance and encouragement as we worked together on this book. I am particularly grateful to the MySQL development team for their cooperation and active participation in the review. The input I received from Sergei Golubchik was invaluable. His knowledge and vision of the MySQL code and architecture is amazing, as well as his ability to pay attention to detail. At times, as I read his reviews, I would wonder if he possibly got tired of the loads of technical detail and would speed-read past some inaccuracies or errors. The next moment I saw a note in red about some little but nevertheless important detail.

Special thanks also go to Brian Aker, Martin “MC” Brown, and Paul Kinzelman for their reviews and suggestions. And last, but not least—special thanks to my wife Sarah and my children Benjamin, Jennifer, Julia, Joseph, and Jacob for their patience and support, as I spent many Saturdays in the office working on the book.

Get Understanding MySQL Internals now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.