Chapter 8. Efficient File Processing, Regular Expressions, and Filename Matching

Efficient File Processing

This simple microbenchmark reads a text file full of numbers and prints their sum:

-- file: ch08/SumFile.hs
main = do
    contents <- getContents
    print (sumFile contents)
  where sumFile = sum . map read . words

Although the String type is the default used for reading and writing files, it is not efficient, so a simple program like this will perform badly.

A String is represented as a list of Char values; each element of a list is allocated individually and has some bookkeeping overhead. These factors affect the memory consumption and performance of a program that must read or write text or binary data. On simple benchmarks like this, even programs written in interpreted languages such as Python can outperform Haskell code that uses String by an order of magnitude.

The bytestring library provides a fast, cheap alternative to the String type. Code written with bytestring can often match or exceed the performance and memory footprint of C, while maintaining Haskell’s expressivity and conciseness.

The library supplies two modules—each defines functions that are nearly drop-in replacements for their String counterparts:

Data.ByteString

Defines a strict type named ByteString. This represents a string of binary or text data in a single array.

Data.ByteString.Lazy

Provides a lazy type, also named ByteString. This represents a string of data as a list of chunks, arrays of up to 64 KB in size.

Each ...

Get Real World Haskell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.