Mark Jason Dominus
Many of the objects in programming are at least conceptually infinite—the text of the Associated Press newswire, the log output from a web server, or the digits of π. There’s a general principle in programming that we should model things as simply and straightforwardly as possible, so that if an object is infinite, we should model it as infinite. That means an infinite data structure.
Of course, we can’t have an infinite data structure, can we? After all, the computer only has a finite amount of memory. But that doesn’t matter. We’re all mortal and so we, and our programs, wouldn’t really know an infinite data structure if we saw one. All that’s really necessary is a data structure that behaves as if it were infinite.
A Unix pipe is a great example of such an object—think of a pipe that happens to be connected to the standard output of the
yes program. From its manual page:
yesprints the command line arguments, separated by spaces and followed by a newline, forever until it is killed.
The output of
yesmight not be infinite, but it’s a credible imitation. So is the output of
tail -f when applied to our web access log.
In this article I’ll demonstrate how to implement a data structure called a streamthat behaves as if it were infinite. We can keep pulling data out of this data structure, and it never runs out. Streams can be filtered, just like Unix data streams can be filtered with
grep, and they can be transformed and merged, just like Unix ...