Chapter 10. Code Case Study: Parsing a Binary Data Format

In this chapter, we’ll discuss a common task: parsing a binary file. We will use it for two purposes. Our first is indeed to talk a little about parsing, but our main goal is to talk about program organization, refactoring, and “boilerplate removal.” We will demonstrate how you can tidy up repetitious code, and set the stage for our discussion of monads in Chapter 14.

The file formats that we will work with come from the netpbm suite, an ancient and venerable collection of programs and file formats for working with bitmap images. These file formats have the dual advantages of being widely used and being fairly easy, though not completely trivial, to parse. Most importantly for our convenience, netpbm files are not compressed.

Grayscale Files

The name of netpbm’s grayscale file format is PGM (portable gray map). It is actually not one format, but two; the plain (or P2) format is encoded as ASCII, while the more common raw (P5) format is mostly binary.

A file of either format starts with a header, which in turn begins with a magic string describing the format. For a plain file, the string is P2, and for raw, it’s P5. The magic string is followed by whitespace, and then by three numbers: the width, height, and maximum gray value of the image. These numbers are represented as ASCII decimal numbers, separated by whitespace.

After the maximum gray value comes the image data. In a raw file, this is a string of binary values. In a plain ...

Get Real World Haskell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.