O'Reilly logo

Real World Haskell by Donald Bruce Stewart, Bryan O'Sullivan, John Goerzen

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Space Leaks and Strict Evaluation

The foldl function that we discussed earlier is not the only place where space leaks can happen in Haskell code. We will use it to illustrate how nonstrict evaluation can sometimes be problematic and how to solve the difficulties that can arise.

Do you need to know all of this right now?

It is perfectly reasonable to skip this section until you encounter a space leak in the wild. Provided you use foldr if you are generating a list, and foldl' instead of foldl otherwise, space leaks are unlikely to bother you in practice for a while.

Avoiding Space Leaks with seq

We refer to an expression that is not evaluated lazily as strict, so foldl' is a strict left fold. It bypasses Haskell’s usual nonstrict evaluation through the use of a special function named seq:

-- file: ch04/Fold.hs
foldl' _    zero []     = zero
foldl' step zero (x:xs) =
    let new = step zero x
    in  new `seq` foldl' step new xs

This seq function has a peculiar type, hinting that it is not playing by the usual rules:

ghci> :type seq
seq :: a -> t -> t

It operates as follows: when a seq expression is evaluated, it forces its first argument to be evaluated, and then returns its second argument. It doesn’t actually do anything with the first argument. seq exists solely as a way to force that value to be evaluated. Let’s walk through a brief application to see what happens:

-- file: ch04/Fold.hs
foldl' (+) 1 (2:[])

This expands as follows:

-- file: ch04/Fold.hs
let new = 1 + 2
in new `seq` foldl' (+) new []

The use of seq forcibly evaluates new to 3 and returns its second argument:

-- file: ch04/Fold.hs
foldl' (+) 3 []

We end up with the following result:

-- file: ch04/Fold.hs

Thanks to seq, there are no thunks in sight.

Learning to Use seq

Without some direction, there is an element of mystery to using seq effectively. Here are some useful rules for using it well.

To have any effect, a seq expression must be the first thing evaluated in an expression:

-- file: ch04/Fold.hs
-- incorrect: seq is hidden by the application of someFunc
-- since someFunc will be evaluated first, seq may occur too late
hiddenInside x y = someFunc (x `seq` y)

-- incorrect: a variation of the above mistake
hiddenByLet x y z = let a = x `seq` someFunc y
                    in anotherFunc a z

-- correct: seq will be evaluated first, forcing evaluation of x
onTheOutside x y = x `seq` someFunc y

To strictly evaluate several values, chain applications of seq together:

-- file: ch04/Fold.hs
chained x y z = x `seq` y `seq` someFunc z

A common mistake is to try to use seq with two unrelated expressions:

-- file: ch04/Fold.hs
badExpression step zero (x:xs) =
    seq (step zero x)
        (badExpression step (step zero x) xs)

Here, the apparent intention is to evaluate step zero x strictly. Since the expression is duplicated in the body of the function, strictly evaluating the first instance of it will have no effect on the second. The use of let from the definition of foldl' just shows illustrates how to achieve this effect correctly.

When evaluating an expression, seq stops as soon as it reaches a constructor. For simple types such as numbers, this means that it will evaluate them completely. Algebraic data types are a different story. Consider the value (1+2):(3+4):[]. If we apply seq to this, it will evaluate the (1+2) thunk. Since it will stop when it reaches the first (:) constructor, it will have no effect on the second thunk. The same is true for tuples: seq ((1+2),(3+4)) True will do nothing to the thunks inside the pair, since it immediately hits the pair’s constructor.

If necessary, we can use normal functional programming techniques to work around these limitations:

-- file: ch04/Fold.hs
strictPair (a,b) = a `seq` b `seq` (a,b)

strictList (x:xs) = x `seq` x : strictList xs
strictList []     = []

It is important to understand that seq isn’t free: it has to perform a check at runtime to see if an expression has been evaluated. Use it sparingly. For instance, while our strictPair function evaluates the contents of a pair up to the first constructor, it adds the overheads of pattern matching, two applications of seq, and the construction of a new tuple. If we were to measure its performance in the inner loop of a benchmark, we might find that it slows down the program.

Aside from its performance cost if overused, seq is not a miracle cure-all for memory consumption problems. Just because you can evaluate something strictly doesn’t mean you should. Careless use of seq may do nothing at all, move existing space leaks around, or introduce new leaks.

The best guides to whether seq is necessary, and how well it is working, are performance measurement and profiling, which we will cover in Chapter 25. From a base of empirical measurement, you will develop a reliable sense of when seq is most useful.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required