Chapter 4. Finding Things
Tim Bray
Computers can compute, but that’s not what people use them for, mostly. Mostly, computers store and retrieve information. Retrieve implies find, and in the time since the advent of the Web, search has become a dominant application for people using computers.
As data volumes continue to grow—both absolutely, and relative to the number of people or computers or anything, really—search becomes an increasingly large part of the life of the programmer as well. A few applications lack the need to locate the right morsel in some information store, but very few.
The subject of search is one of the largest in computer science, and thus I won’t try to survey all of it or discuss the mechanics; in fact, I’ll only consider one simple search technique in depth. Instead, I’ll focus on the trade-offs that go into selecting search techniques, which can be subtle.
On Time
You really can’t talk about search without talking about time. There are two different flavors of time that apply to problems of search. The first is the time it takes the search to run, which is experienced by the user who may well be staring at a message saying something like “Loading…”. The second is the time invested by the programmer who builds the search function, and by the programmer’s management and customers waiting to use the program.
Problem: Weblog Data
Let’s look at a sample problem to get a feel for how a search works in real life. I have a directory containing logfiles from my weblog ...