Chapter 4. Functional Data Structures
Data structures are a foundational concept in computer science. Along with algorithms, they are a staple of what a computer science student or programmer must master. As with the phrase functional pattern, functional data structure is not an established concept with a consistent definition in the canon of code.
In this book, I will refer to two things as functional data structures.
-
Structures used in functional patterns, such as
Option
,Either
,Try
,List
. These are monads. -
Ordinary data structures that are implemented in a functional way so that they don’t mutate state, such as a linked list.
In this chapter, we will treat the first kind of functional data structures and then briefly cover some of the ideas surrounding ordinary data structures implemented in a functional way. Before we look at particular data structures, let me mention an idea about functional data structures that has been discussed in the literature. First, there is this quote from Alan Perlis:
It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures.
Seasoned functional programmers tend to use a small set of data structures: linked lists, arrays, and hash tables, as well as structures such as Option
, Either
, and Try
, to name a few. We will examine these next. I believe the idea behind this quote is that fewer data structures results in more uniformity and simplicity in the codebase. Now, let’s look at some data structures that are particularly functional. We will start with Option
.
The Option Data Structure
I use the word Option
as a language neutral term (not as part of any particular programming language). Various programming languages have a version of this data structure and I will give examples of some of them, but first let’s flesh out the idea. Our discussion really should start with the null
construct. Null is used for a value that might not exist; an optional value or a situation where a value is not known. One problem with null
is that if a variable contains an object and the value of that variable is null
, and a method is called on that object, the program will throw a null pointer exception. Exceptions break referential transparency and so are frowned upon by functional programmers. Tony Hoare, the inventor of null
, said it was his billion-dollar mistake:
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object-oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.1
Yet without null
, how do we handle a piece of data that is optional or that may not have a value? Well, one of the main problems with null
is that it doesn’t really have a type. What if we could create a type that represented the case when a value is missing or optional? There is such a thing—it exists in many languages. In those languages in which it doesn’t exist, it’s not hard to create one. There are a few ways of referring to this type but most of them involve some version of the word Option
. We have Optional
in Java.2 Python has something that resembles Option
in some ways but does not have the full functionality of Option
: the None
object. However, it’s not hard to construct Option
in Python. The idea behind Option
is this: suppose we have a method that takes an id
and returns a Customer
object. What do we do if there is no Customer
with the given id
?
Warning
We could throw an exception when the Customer
is not found, but as I said before, this is not functional, since it breaks referential transparency. If a function throws an exception, it is no longer true that whenever you put in the same input, you get out the same output.
Even if you like exceptions, this example really isn’t a suitable case for an exception. An exception refers to something that should not happen, but does. It’s arguable that there really isn’t any good reason to believe that any old id
a system comes up with will correspond to an actual Customer
. The situation in which this happens is not uncommon. You might even say it’s one of the outcomes that we should expect. So how can we handle this without an exception? Let us consider the Option
construct. By Option
, I mean a general data structure and not a construct in any particular programming language. An Option
usually has two parts: one is a container that holds a value. This is the case where a valid value was computed or obtained somehow. The second part represents the case where no valid value is available. This would be a situation where a language without an Option
type might use a null
. Let’s see an example in Scala. Let’s return to the example of retrieving a customer, given an id
. In Scala, we can do the following:
Scala
def
getCustomer
(
id
:
Int
):
Option
[
Customer
]
=
{
//code which retrieves customer from database.
}
This function can return one of two things. In the case where the id
corresponds to a customer, the method will return:
Some
(
customer
)
If no customer corresponds to that particular id
, the function will return the value:
None
You can then take the necessary action to deal with that situation. If it returns Some(customer)
, you can then get the customer
out of the Some
container in a variety of ways. One common way is to use Scala’s pattern matching feature. Here is an example of this:
getCustomer
(
17
)
match
{
case
Some
(
customer
)
=>
//do something with customer
case
None
=>
//Take action to handle this situation.
}
Another, perhaps more common approach is to use one of the higher order functions, such as map
, flatMap
, or filter
. For example:
val
customer
=
getCustomer
(
17
)
map
{
customer
=>
customer
.
firstName
}
The variable customer, in the case where a value was obtained, will contain Some("Peter")
. If no customer was found, a None
would be returned and the programmer can handle it.
Note
It is important to understand that Some
and None
are types and therefore the compiler can help to find errors associated with them, unlike null
, which occurs at runtime (not compile time).
Suppose now that you had a list of Option
objects, each a Some(customer)
or a None
. And suppose you wanted to change this to a list of customers where you ignore the None
s. Also suppose getCustomers
returns a list of options of Customer
objects.
The Customer
object might be a case class
that looks like this:
case
class
Customer
(
id
:
Int
,
firstName
:
String
)
The list of options of customers looks like this:
val
customers
=
List
(
Some
(
Customer
(
1
,
"Bob"
)),
None
,
Some
(
Customer
(
33
,
"Lee"
)),
None
,
Some
(
Customer
(
5
,
"Rhonda"
)))
Then we could do the following:
customers
.
flatten
This would return:
List
(
Customer
(
1
,
"Bob"
),
Customer
(
33
,
"Lee"
),
Customer
(
5
,
"Rhonda"
))
Notice how the None
s were conveniently left out.
Now let’s look at some examples in Java.
Java
Optional
<
User
>
optUser
=
findUserById
(
123
);
optUser
.
ifPresent
(
user
->
{
System
.
out
.
println
(
"User's name = "
+
user
.
getName
());})
}
In this example, findUserById
returns an Optional
of a User
object if it finds one for the given id
. If not, the code in the braces is not executed. The older, pre-Optional
way of doing this would have been for findUserById
to return a null
if a user was not found. The problem with that is that if a method were then called on the User
object, which was actually a null
, a NullPointerException
would be thrown. With the preceding code, which uses Optional
, no exception is thrown.
Now, Python does not have an Option
class. It does have the None
object; this is only half of what makes up an option. It is, however, useful in its own right. You can do things like this:
Python
def
getUser
(
id
):
#get user from database
#if database call fails
return
None
Then, you can do this, when calling this function:
if
getUser
(
789
)
is
not
None
:
#do something
You may say that this looks an awful lot like checking for a null
, and I would agree with you. If you really want to, you can create an Option
class in Python. One way is shown here:
class
Option
:
def
get_or_else
(
self
,
default
):
return
self
.
value
if
isinstance
(
self
,
Some
)
else
default
class
Some
(
Option
):
def
__init__
(
self
,
value
):
self
.
value
=
value
class
Nothing
(
Option
):
pass
I called this Nothing
, because Python already has a None
object. This isn’t particularly “pythonic,” but it’s one way to get the job done.
C# also has a version of this idea. It’s not exactly an Option
, but it is a construct that helps with problems caused by null
s: the Nullable
type. With this type, we can represent, in a way that makes it explicit, that a variable may be holding a null
value. For a given type, say int
, we can form the type:
C#
Nullable
<
int
>
As you can see, this is a generic type. A shorthand way of writing this is:
int?
We can also write code like:
Nullable
<
int
>
n1
=
new
Nullable
<
int
>(
10
);
Nullable
<
int
>
n2
=
null
;
if
(
n1
.
HasValue
)
{
process
(
n1
.
Value
);
}
else
{
//n1 has a null value.
}
This is a way of navigating null
s without risking a NullPointerException
.
The Try Data Structure
While the Option
construct is useful, and an improvement over null
s, it doesn’t give you a clue about why there is no legitimate value. Exceptions do this, but Option
s do not. Option
s are a great solution in many situations, but sometimes you need to get some information about what went wrong. The Try
construct addresses this. Just as Option
has Some
and None
, Try
has Success
and Failure
. Failure
wraps the exception that was thrown. While we try to avoid exceptions as much as possible in FP, sometimes we cannot avoid it. Try
is one solution to this problem. Here is some code.
Scala
def
divide
(
a
:
Float
,
b
:
Float
):
Try
[
Float
]
=
Try
(
a
/
b
)
Then we can do the following:
divide
(
3
,
0
)
match
{
case
Success
(
result
)
=>
//do something with result
case
Failure
(
ex
)
=>
println
(
ex
.
getMessage
)
}
Here is an example of Try
in a for
comprehension:
def
toInt
(
s
:
String
):
Try
[
Int
]
=
Try
(
Integer
.
parseInt
(
s
.
trim
))
val
y
=
for
{
a
<-
toInt
(
"9"
)
b
<-
toInt
(
"3.5"
)
c
<-
toInt
(
"6"
)
}
yield
a
+
b
+
c
Notice how well the error case is handled. y
will either contain a Success
wrapping the sum of a
, b
, and c
, or it will contain a Failure
wrapping an exception.
In this case, the result will be:
Failure
(
java
.
lang
.
NumberFormatException
:
For
input
string
:
"3.5"
)
The Either Data Structure
While the Try
type gives you more information about a failure or exceptional outcome than Option
, you might wonder if there is a type that gives you more flexibility. The Either
construct addresses this in the following manner: an Either
object can be either a Right
or a Left
. Right
is like the Some
type of Option
. Left
is like None
, except it can wrap any type. Let us consider an example in Scala.
Scala
def
divide
(
a
:
Int
,
b
:
Int
):
Either
[
String
,
Int
]
=
{
if
(
b
==
0
)
Left
(
"Can't divide by zero."
)
else
Right
(
a
/
b
)
}
Then, you can do something like this:
divide
(
a
,
b
)
match
{
case
Left
(
msg
)
=>
//do something with msg
case
Right
(
res
)
=>
//do something with res
}
Left
and Right
can wrap any types. For example, suppose you had two custom types, User
and Error
:
Scala
case
class
User
(
id
:
Int
,
firstName
:
String
)
case
class
Error
(
id
:
Int
,
text
:
String
)
def
getUser
(
id
:
Int
):
Either
[
Error
,
User
]
=
{
//If I retrieve a user from api
Right
(
User
(
id
,
firstName
)
//else if there is an error
Left
(
id
,
text
)
}
Then I can do the following:
getUser
(
4
)
match
{
case
Right
(
user
)
=>
println
(
user
.
firstName
)
case
Left
(
error
)
=>
println
(
error
.
text
)
}
Left
can even wrap an exception, if for some reason you find it necessary or prudent to use one and you don’t want to use a Try
. For example, you might be using a library that throws an exception. It is always possible to catch the exception and return an error type wrapped in Left
.
As we get deeper into FP, it will be useful and in some cases necessary to use third-party functional libraries that include many of the structures we have been discussing. You have seen examples in Scala, Java, Python, and C#. These languages vary in how much FP they can do without libraries, but all of them need libraries
for some things.3 For Java, there are at least two libraries that allow functional constructs. One is called Functional Java and another is Vavr. I will be giving examples from Vavr. I will always state when some code relies on Vavr. Let us see what Either
looks like in Java with Vavr.
Java with Vavr
private
static
Either
<
String
,
User
>
getUser
(
int
id
)
{
//code that sucessfully found a user
return
Either
.
right
(
user
);
}
else
{
return
Either
.
left
(
String
.
format
(
"No user found with id %s"
,
id
);
}
}
What about a functional library for Python? One excellent library is OSlash. With OSlash, we can write things in Python like the following:
Python (with OSlash)
def
get_user
(
id
):
#successful case yields a user object
return
Right
(
user
)
#error case returns an error object
return
Left
(
error
)
Then we can do this:
if
isinstance
(
result
,
Right
):
#do something with successfull result
else
:
# result is an error
Higher Order Functions
Functions that return these functional data structures like Option
or Either
lend themselves to higher order functions. Let us first see some simple examples of this.
Scala
case
class
User
(
id
,
firstName
)
val
users
:
List
[
Option
[
User
]]
=
List
(
Some
(
User
(
1
,
"Jack"
)),
None
,
Some
(
User
(
2
,
"Andrea"
)),
None
,
None
)
The users
list could be the result of a function that queries an API and gets options of users back. In those cases in which no User
was found for a given id
, the function returns a None
. To get just the users
and not the None
s, we can do this:
users
.
flatten
This will return:
List
(
User
(
1
,
"Jack"
),
User
(
2
,
"Andrea"
))
Often, one has a data pipeline of values that they are transforming as they call a bunch of higher order functions.
For example, suppose we have a Scala function, getUsers
, that returns a list of Either[Error,User]
, where User
has a field called email
. Suppose that we want to filter out all users with an example.com account. We expect the result to be a List[Either[Error,String]]
, where the String
represents the email address. An imperative approach, which loops through the List
, would be a bit messy. Let’s see how a functional approach, using higher order functions, greatly simplifies the code. It does this by going directly to the essence of what’s going on. Looping is not part of that essence. Let’s see some code.
Scala
case
class
User
(
id
:
Int
,
:
String
)
case
class
Error
(
id
:
Int
,
text
:
String
)
def
getUsers
():
List
[
Either
[
Error
,
User
]]
=
List
(
Right
(
User
(
1
,
"jack@example.com"
)),
Left
(
Error
(
4
,
"user not found"
)),
Right
(
User
(
2
,
"andrea@example.com"
)))
Then we can do this:
val
emails
=
getUsers
().
map
(
either
=>
either
.
map
(
u
=>
u
.
))
Now, emails will contain the following:
List
(
Right
(
jack
@example
.
com
),
Left
(
Error
(
4
,
user
not
found
)),
Right
(
andrea
@example
.
com
))
This allows you to drill down into the hierarchy of fields to the one you want.
Monads in for Comprehensions in Scala
As we have seen, things like List
, Option
, Try
, and Either
, are monads. One very useful thing you can do with monads in Scala is for
comprehensions. Here is an example:
Scala
case
class
Student
(
id
:
Int
,
:
String
)
case
class
FinalGrade
(
grade
:
Int
)
def
getStudent
(
id
:
Int
):
Option
[
Student
]
=
???
def
getFinalGrade
(
student
:
Student
):
Option
[
FinalGrade
]
This function tries to get a User object that corresponds to the given id and if it finds it, it returns Some(User(493, "Alex"))
, for example. If it does not find it, it returns None
. We can now do the following:
for
{
student
<-
getStudent
(
999
)
//getStudent returns a monad
finalGrade
<-
getFinalGrade
(
student
)
//getFinalGrade returns a monad
}
yield
(
student
,
finalGrade
)
This would return a tuple consisting of an Option
of the student
and an Option
of the finalGrade
. If either of the Option
s were None
, the value of the for
comprehension would be None
and not the tuple of values. What this does for comprehension, in effect, is take perfectly functional code and make it look more like traditional imperative code. In many cases, it results in clearer code. It turns out that a for
comprehension is really just syntactic sugar for a series of flatMap
and map
calls and we know that every monad has flatMap
and map
. Let’s start out with a list of Option
s of students
:
Scala
case
class
Student
(
id
:
Int
,
:
String
)
val
students
=
List
(
Some
(
Student
(
1
,
"jack@example.com"
)),
None
,
Some
(
Student
(
2
,
"andrea@example.com"
)),
None
)
First, let’s assume you just want to get the students
out of the options and ignore the None
s. You can do this with a simple call to flatten
:
students
.
flatten
// returns List(Student(1,"jack@example.com"),
Student
(
2
,
"andrea@example.com"
))
Now, let’s say you are just interested in the emails
and you want to get a list of them. You can do the following:
students
.
flatten
.
map
(
student
=>
student
.
)
// This will return List("jack@example.com","andrea@example.com")
So we did a flatten
and then a map
. Sometimes you want to map
first, then flatten
. This is precisely what flatMap
does. Let’s look at an example of this:
students
.
flatMap
(
optStudent
=>
opStudent
)
// returns List(Student(1,"jack@example.com"), Student(2,"andrea@example.com"))
In this case, for the map
, we used the identity function. This made the flatMap
call the same as a call to flatten
. We did not use the full power of flatMap
. Let’s see a more complicated example. Suppose we have a getStudent
function that takes an id
and returns an Option
of a Student
. This is a good idea since some id
s might not refer to any Student
. Here is some code.
def
getStudent
(
id
:
Int
):
Option
[
Student
]
=
{
id
match
{
case
1
=>
Some
(
Student
(
1
,
"jack@example.com"
))
case
2
=>
Some
(
Student
(
2
,
"andrea@example.com"
))
case
_
=>
None
}
}
Let us suppose now that we have a of id
s and we want to call the previous function on the list somehow, but the function returns an Option
. We can do the following:
List
(
1
,
2
,
3
).
map
(
id
=>
getStudent
(
id
))
This will return:
List
(
Some
(
Student
(
1
,
jack
@example
.
com
)),
Some
(
Student
(
2
,
andrea
@example
.
com
)),
None
)
This is probably not what we wanted. flatMap
to the rescue.
List
(
1
,
2
,
3
).
flatMap
(
id
=>
getStudent
(
id
))
Now this will return:
List
(
Student
(
1
,
jack
@example
.
com
),
Student
(
2
,
andrea
@example
.
com
))
Perfect! flatMap
is useful when you have a list (or any monad) and you want to map
it, but the function that you want to map
it with returns a monad itself. The flatten
part of flatMap
will remove the result from its container—that is, its monad.
Traditional Data Structures
Now we will look at traditional data structures, as opposed to functional data structures.
Immutability and History
As with everything functional, immutability is an important piece of the puzzle. What I am calling history here refers to the fact that since we do not change anything, we must copy something that changes (and retain its former state). For example, a stack can be functional if implemented in a certain way. Operations such as push
would have to make a copy of the stack instead of modifying it in place. This requires, in general, a way of doing quick copies. There are various algorithms for this. There are even databases that follow this paradigm. Datomic, for example, a database created by Rich Hickey, creator of the Clojure language (which has many functional capabilities), is a database that never overwrites data. It keeps a copy of all data and changes made. This is another way that the functional paradigm is becoming more mainstream.
Note
Whether or not a traditional data structure is functional depends on the particular implementation of that data structure.
Laziness
We have mentioned laziness before in connection with what makes a language functional (see “Lazy Evaluation”). The basic idea is that a function or process is lazy if it is evaluated only when needed. Part of this is the fact that if a function carries out a long computation, the value will be saved or memoized and then used the next time it is needed (if possible). This capability is useful in creating efficient functional data structures. Haskell is a lazy language and Scala has a lazy keyword but is not lazy by default. In general, for a language to be partially or fully lazy, it tends to be a language that was explicitly constructed to be, at least in part, functional.
Note
Laziness allows us to wait to evaluate an expression when and only when we need it.
Conclusion
In summary, there are two types of functional data structures. The first is a set of constructs that come from category theory, generally by way of Haskell, and which have a map
and flatMap
functions, which makes them monads. The second consists of traditional data structures that have been implemented according to the principles mentioned earlier.
1 Hoare, Tony. “Null References: The Billion Dollar Mistake.” Historically Bad Ideas. Lecture presented at the QCon, 2009. https://oreil.ly/sQSnH.
2 Like many functional constructs, this requires Java 8 or above.
Get Learning Functional Programming now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.