Chapter 11. Hashing

Hashing is a technique that promises to achieve O(1) data lookup. This doesn't mean that only one comparison will be made, but rather that the number of comparisons will remain the same, no matter how large the data set. Compare this with the O(N) search time of simple linked lists or even O(log N) for binary search trees, and hashing starts to look very attractive.

This chapter discusses the following:

  • Understanding hashing

  • Working with hash functions

  • Assessing performance

  • Comparing the results

Understanding Hashing

You may not even realize it, but chances are good you use concepts similar to hashing all the time. When you walk into a bookstore and head straight for the computer book section, you've just used a kind of hashing algorithm. When you are looking for a music CD by a particular artist, you no doubt go straight to the section containing CDs by artists with the same first letter of that artist's surname. Both these processes involve taking some property of the thing you are looking for—a book category or an artist's name—and using that to narrow down the search. In the case of the book, you know it is a computer book so you head straight for that section; in the case of the CD, you know the artist's name.

Hashing begins with a hash function. A hash function takes one value—a string, an object, a number, and so on—and produces a hash value, usually an integer or some other numeric value. This hash value is then used to locate a position within a hash table

Get Beginning Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.