Chapter 4. Functional Data Structures

Data structures are a foundational concept in computer science. Along with algorithms, they are a staple of what a computer science student or programmer must master. As with the phrase functional pattern, functional data structure is not an established concept with a consistent definition in the canon of code.

In this book, I will refer to two things as functional data structures.

  • Structures used in functional patterns, such as Option, Either, Try, List. These are monads.

  • Ordinary data structures that are implemented in a functional way so that they don’t mutate state, such as a linked list.

In this chapter, we will treat the first kind of functional data structures and then briefly cover some of the ideas surrounding ordinary data structures implemented in a functional way. Before we look at particular data structures, let me mention an idea about functional data structures that has been discussed in the literature. First, there is this quote from Alan Perlis:

It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures.

Seasoned functional programmers tend to use a small set of data structures: linked lists, arrays, and hash tables, as well as structures such as Option, Either, and Try, to name a few. We will examine these next. I believe the idea behind this quote is that fewer data structures results in more uniformity and simplicity in the codebase. Now, let’s look at some data structures that are particularly functional. We will start with Option.

The Option Data Structure

I use the word Option as a language neutral term (not as part of any particular programming language). Various programming languages have a version of this data structure and I will give examples of some of them, but first let’s flesh out the idea. Our discussion really should start with the null construct. Null is used for a value that might not exist; an optional value or a situation where a value is not known. One problem with null is that if a variable contains an object and the value of that variable is null, and a method is called on that object, the program will throw a null pointer exception. Exceptions break referential transparency and so are frowned upon by functional programmers. Tony Hoare, the inventor of null, said it was his billion-dollar mistake:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object-oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.1

Yet without null, how do we handle a piece of data that is optional or that may not have a value? Well, one of the main problems with null is that it doesn’t really have a type. What if we could create a type that represented the case when a value is missing or optional? There is such a thing—it exists in many languages. In those languages in which it doesn’t exist, it’s not hard to create one. There are a few ways of referring to this type but most of them involve some version of the word Option. We have Optional in Java.2 Python has something that resembles Option in some ways but does not have the full functionality of Option: the None object. However, it’s not hard to construct Option in Python. The idea behind Option is this: suppose we have a method that takes an id and returns a Customer object. What do we do if there is no Customer with the given id?

Warning

We could throw an exception when the Customer is not found, but as I said before, this is not functional, since it breaks referential transparency. If a function throws an exception, it is no longer true that whenever you put in the same input, you get out the same output.

Even if you like exceptions, this example really isn’t a suitable case for an exception. An exception refers to something that should not happen, but does. It’s arguable that there really isn’t any good reason to believe that any old id a system comes up with will correspond to an actual Customer. The situation in which this happens is not uncommon. You might even say it’s one of the outcomes that we should expect. So how can we handle this without an exception? Let us consider the Option construct. By Option, I mean a general data structure and not a construct in any particular programming language. An Option usually has two parts: one is a container that holds a value. This is the case where a valid value was computed or obtained somehow. The second part represents the case where no valid value is available. This would be a situation where a language without an Option type might use a null. Let’s see an example in Scala. Let’s return to the example of retrieving a customer, given an id. In Scala, we can do the following:

Scala

def getCustomer(id: Int): Option[Customer] = {
    //code which retrieves customer from database.
}

This function can return one of two things. In the case where the id corresponds to a customer, the method will return:

Some(customer)

If no customer corresponds to that particular id, the function will return the value:

None

You can then take the necessary action to deal with that situation. If it returns Some(customer), you can then get the customer out of the Some container in a variety of ways. One common way is to use Scala’s pattern matching feature. Here is an example of this:

getCustomer(17) match {
    case Some(customer) => //do something with customer
    case None => //Take action to handle this situation.
}

Another, perhaps more common approach is to use one of the higher order functions, such as map, flatMap, or filter. For example:

val customer = getCustomer(17) map { customer => customer.firstName}

The variable customer, in the case where a value was obtained, will contain Some("Peter"). If no customer was found, a None would be returned and the programmer can handle it.

Note

It is important to understand that Some and None are types and therefore the compiler can help to find errors associated with them, unlike null, which occurs at runtime (not compile time).

Suppose now that you had a list of Option objects, each a Some(customer) or a None. And suppose you wanted to change this to a list of customers where you ignore the Nones. Also suppose getCustomers returns a list of options of Customer objects. The Customer object might be a case class that looks like this:

case class Customer(id: Int, firstName: String)

The list of options of customers looks like this:

val customers = List(Some(Customer(1,"Bob")), None, Some(Customer(33, "Lee")),
    	      				      None, Some(Customer(5, "Rhonda")))

Then we could do the following:

customers.flatten

This would return:

List(Customer(1,"Bob"), Customer(33, "Lee"), Customer(5, "Rhonda"))

Notice how the Nones were conveniently left out.

Now let’s look at some examples in Java.

Java

     Optional<User> optUser = findUserById(123);

     optUser.ifPresent(user -> {
             System.out.println("User's name = "
     + user.getName());})
}

In this example, findUserById returns an Optional of a User object if it finds one for the given id. If not, the code in the braces is not executed. The older, pre-Optional way of doing this would have been for findUserById to return a null if a user was not found. The problem with that is that if a method were then called on the User object, which was actually a null, a NullPointerException would be thrown. With the preceding code, which uses Optional, no exception is thrown.

Now, Python does not have an Option class. It does have the None object; this is only half of what makes up an option. It is, however, useful in its own right. You can do things like this:

Python

def getUser(id):
    #get user from database

    #if database call fails
    return None

Then, you can do this, when calling this function:

if getUser(789) is not None:
   #do something

You may say that this looks an awful lot like checking for a null, and I would agree with you. If you really want to, you can create an Option class in Python. One way is shown here:

class Option:
    def get_or_else(self, default):
        return self.value if isinstance(self, Some) else default

class Some(Option):
    def __init__(self, value):
        self.value = value

class Nothing(Option):
    pass

I called this Nothing, because Python already has a None object. This isn’t particularly “pythonic,” but it’s one way to get the job done.

C# also has a version of this idea. It’s not exactly an Option, but it is a construct that helps with problems caused by nulls: the Nullable type. With this type, we can represent, in a way that makes it explicit, that a variable may be holding a null value. For a given type, say int, we can form the type:

C#

Nullable<int>

As you can see, this is a generic type. A shorthand way of writing this is:

int?

We can also write code like:

Nullable<int> n1 = new Nullable<int>(10);
Nullable<int> n2 = null;

if (n1.HasValue) {
   process(n1.Value);
} else {
  //n1 has a null value.
}

This is a way of navigating nulls without risking a NullPointerException.

The Try Data Structure

While the Option construct is useful, and an improvement over nulls, it doesn’t give you a clue about why there is no legitimate value. Exceptions do this, but Options do not. Options are a great solution in many situations, but sometimes you need to get some information about what went wrong. The Try construct addresses this. Just as Option has Some and None, Try has Success and Failure. Failure wraps the exception that was thrown. While we try to avoid exceptions as much as possible in FP, sometimes we cannot avoid it. Try is one solution to this problem. Here is some code.

Scala

def divide(a: Float, b: Float): Try[Float] = Try(a/b)

Then we can do the following:

divide(3, 0) match {
   case Success(result) => //do something with result
   case Failure(ex) => println(ex.getMessage)
}

Here is an example of Try in a for comprehension:

def toInt(s: String): Try[Int] = Try(Integer.parseInt(s.trim))

val y = for {
    a <- toInt("9")
    b <- toInt("3.5")
    c <- toInt("6")
} yield a + b + c

Notice how well the error case is handled. y will either contain a Success wrapping the sum of a, b, and c, or it will contain a Failure wrapping an exception.

In this case, the result will be:

Failure(java.lang.NumberFormatException: For input string: "3.5")

The Either Data Structure

While the Try type gives you more information about a failure or exceptional outcome than Option, you might wonder if there is a type that gives you more flexibility. The Either construct addresses this in the following manner: an Either object can be either a Right or a Left. Right is like the Some type of Option. Left is like None, except it can wrap any type. Let us consider an example in Scala.

Scala

def divide(a: Int, b: Int): Either[String, Int] = {
    if (b == 0)
       Left("Can't divide by zero.")
    else
       Right(a/b)
}

Then, you can do something like this:

divide(a, b) match {
    case Left(msg) => //do something with msg
    case Right(res) => //do something with res
}

Left and Right can wrap any types. For example, suppose you had two custom types, User and Error:

Scala

case class User(id: Int, firstName: String)

case class Error(id: Int, text: String)

def getUser(id: Int): Either[Error,User] = {
    //If I retrieve a user from api
    Right(User(id, firstName)

    //else if there is an error
    Left(id, text)
}

Then I can do the following:

getUser(4) match {
    case Right(user) => println(user.firstName)
    case Left(error) => println(error.text)
  }

Left can even wrap an exception, if for some reason you find it necessary or prudent to use one and you don’t want to use a Try. For example, you might be using a library that throws an exception. It is always possible to catch the exception and return an error type wrapped in Left.

As we get deeper into FP, it will be useful and in some cases necessary to use third-party functional libraries that include many of the structures we have been discussing. You have seen examples in Scala, Java, Python, and C#. These languages vary in how much FP they can do without libraries, but all of them need libraries for some things.3 For Java, there are at least two libraries that allow functional constructs. One is called Functional Java and another is Vavr. I will be giving examples from Vavr. I will always state when some code relies on Vavr. Let us see what Either looks like in Java with Vavr.

Java with Vavr

private static Either<String, User> getUser(int id) {
    //code that sucessfully found a user
        return Either.right(user);
    } else {
        return Either.left(String.format("No user found with id  %s", id);
    }
}

What about a functional library for Python? One excellent library is OSlash. With OSlash, we can write things in Python like the following:

Python (with OSlash)

def get_user(id):
    #successful case yields a user object
    return Right(user)
    #error case returns an error object
    return Left(error)

Then we can do this:

if isinstance(result,Right):
   #do something with successfull result
else:
    # result is an error

Higher Order Functions

Functions that return these functional data structures like Option or Either lend themselves to higher order functions. Let us first see some simple examples of this.

Scala

case class User(id, firstName)

val users: List[Option[User]] = List(Some(User(1, "Jack")),
				     None, Some(User(2, "Andrea")), None, None)

The users list could be the result of a function that queries an API and gets options of users back. In those cases in which no User was found for a given id, the function returns a None. To get just the users and not the Nones, we can do this:

users.flatten

This will return:

List(User(1,"Jack"),User(2,"Andrea"))

Often, one has a data pipeline of values that they are transforming as they call a bunch of higher order functions. For example, suppose we have a Scala function, getUsers, that returns a list of Either[Error,User], where User has a field called email. Suppose that we want to filter out all users with an example.com account. We expect the result to be a List[Either[Error,String]], where the String represents the email address. An imperative approach, which loops through the List, would be a bit messy. Let’s see how a functional approach, using higher order functions, greatly simplifies the code. It does this by going directly to the essence of what’s going on. Looping is not part of that essence. Let’s see some code.

Scala

case class User(id: Int, email: String)
case class Error(id: Int, text: String)

def getUsers(): List[Either[Error,User]] =
    List(Right(User(1, "jack@example.com")), Left(Error(4,"user not found")),
         Right(User(2, "andrea@example.com")))

Then we can do this:

val emails = getUsers().map(either => either.map(u => u.email))

Now, emails will contain the following:

List(Right(jack@example.com), Left(Error(4,user not found)),
    Right(andrea@example.com))

This allows you to drill down into the hierarchy of fields to the one you want.

Monads in for Comprehensions in Scala

As we have seen, things like List, Option, Try, and Either, are monads. One very useful thing you can do with monads in Scala is for comprehensions. Here is an example:

Scala

case class Student(id: Int, email: String)
case class FinalGrade(grade: Int)
def getStudent(id: Int): Option[Student] = ???
def getFinalGrade(student: Student): Option[FinalGrade]

This function tries to get a User object that corresponds to the given id and if it finds it, it returns Some(User(493, "Alex")), for example. If it does not find it, it returns None. We can now do the following:

for {
    student <- getStudent(999) //getStudent returns a monad
    finalGrade <- getFinalGrade(student) //getFinalGrade returns a monad
} yield (student,finalGrade)

This would return a tuple consisting of an Option of the student and an Option of the finalGrade. If either of the Options were None, the value of the for comprehension would be None and not the tuple of values. What this does for comprehension, in effect, is take perfectly functional code and make it look more like traditional imperative code. In many cases, it results in clearer code. It turns out that a for comprehension is really just syntactic sugar for a series of flatMap and map calls and we know that every monad has flatMap and map. Let’s start out with a list of Options of students:

Scala

case class Student(id: Int, email: String)
val students = List(Some(Student(1,"jack@example.com")), None,
    	            Some(Student(2, "andrea@example.com")), None)

First, let’s assume you just want to get the students out of the options and ignore the Nones. You can do this with a simple call to flatten:

students.flatten // returns List(Student(1,"jack@example.com"),
    Student(2,"andrea@example.com"))

Now, let’s say you are just interested in the emails and you want to get a list of them. You can do the following:

students.flatten.map(student => student.email)

// This will return List("jack@example.com","andrea@example.com")

So we did a flatten and then a map. Sometimes you want to map first, then flatten. This is precisely what flatMap does. Let’s look at an example of this:

students.flatMap(optStudent => opStudent)

// returns List(Student(1,"jack@example.com"), Student(2,"andrea@example.com"))

In this case, for the map, we used the identity function. This made the flatMap call the same as a call to flatten. We did not use the full power of flatMap. Let’s see a more complicated example. Suppose we have a getStudent function that takes an id and returns an Option of a Student. This is a good idea since some ids might not refer to any Student. Here is some code.

def getStudent(id: Int): Option[Student] = {
    id match {
      case 1 => Some(Student(1, "jack@example.com"))
      case 2 => Some(Student(2, "andrea@example.com"))
      case _ => None
    }
  }

Let us suppose now that we have a of ids and we want to call the previous function on the list somehow, but the function returns an Option. We can do the following:

List(1,2,3).map(id => getStudent(id))

This will return:

List(Some(Student(1,jack@example.com)), Some(Student(2,andrea@example.com)), None)

This is probably not what we wanted. flatMap to the rescue.

List(1,2,3).flatMap(id => getStudent(id))

Now this will return:

List(Student(1,jack@example.com), Student(2,andrea@example.com))

Perfect! flatMap is useful when you have a list (or any monad) and you want to map it, but the function that you want to map it with returns a monad itself. The flatten part of flatMap will remove the result from its container—that is, its monad.

Traditional Data Structures

Now we will look at traditional data structures, as opposed to functional data structures.

Immutability and History

As with everything functional, immutability is an important piece of the puzzle. What I am calling history here refers to the fact that since we do not change anything, we must copy something that changes (and retain its former state). For example, a stack can be functional if implemented in a certain way. Operations such as push would have to make a copy of the stack instead of modifying it in place. This requires, in general, a way of doing quick copies. There are various algorithms for this. There are even databases that follow this paradigm. Datomic, for example, a database created by Rich Hickey, creator of the Clojure language (which has many functional capabilities), is a database that never overwrites data. It keeps a copy of all data and changes made. This is another way that the functional paradigm is becoming more mainstream.

Note

Whether or not a traditional data structure is functional depends on the particular implementation of that data structure.

Laziness

We have mentioned laziness before in connection with what makes a language functional (see “Lazy Evaluation”). The basic idea is that a function or process is lazy if it is evaluated only when needed. Part of this is the fact that if a function carries out a long computation, the value will be saved or memoized and then used the next time it is needed (if possible). This capability is useful in creating efficient functional data structures. Haskell is a lazy language and Scala has a lazy keyword but is not lazy by default. In general, for a language to be partially or fully lazy, it tends to be a language that was explicitly constructed to be, at least in part, functional.

Note

Laziness allows us to wait to evaluate an expression when and only when we need it.

Conclusion

In summary, there are two types of functional data structures. The first is a set of constructs that come from category theory, generally by way of Haskell, and which have a map and flatMap functions, which makes them monads. The second consists of traditional data structures that have been implemented according to the principles mentioned earlier.

1 Hoare, Tony. “Null References: The Billion Dollar Mistake.” Historically Bad Ideas. Lecture presented at the QCon, 2009. https://oreil.ly/sQSnH.

2 Like many functional constructs, this requires Java 8 or above.

3 An exception is the pure FP language, Haskell.

Get Learning Functional Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.