Mark Jason Dominus
B-trees are often described as a good way to store and retrieve data, especially to and from disk. They’re efficient to use and easy to program—they’re often mentioned, but not so often discussed. They show up all over the place, such as in the Berkeley DB_ File package, large high-performance databases, and in IBM’s well-known VSAM files.
In this article we’ll see how they work. First, we’ll review how ordinary binary trees store data, and we’ll see the potential disadvantages of that. We’ll look over the Btree algorithm in the abstract, and see how it corrects the flaws of binary trees. Then we’ll look at the code of a Perl module that implements the B-tree algorithms and see how B-trees can be implemented in Perl.
Both binary trees and B-trees are structures for storing and retrieving data, just like hashes. In fact, to the user, they look exactly like hashes. Each contains a collection of keys, and each key is associated with a particular datum. In order to be efficient, data structures should allow us to look up the datum associated with a particular key very quickly. Hashes, binary trees, and B-trees all do that.
You’re supposed to learn about binary trees early in your programming career, so the next part of this article should be a review. If you don’t know about binary trees yet, you might want to read a book on data structures first, such as the one in this article’s bibliography or O’Reilly’s Mastering Algorithms with Perl. Otherwise, ...