Apache Spark is beyond the scope of this book, so if you want to know more about this powerful framework, I suggest you read the online documentation or one the many books available. In Pentreath N., Machine Learning with Spark, Packt, there's an interesting introduction on the library MLlib and how to implement most of the algorithms discussed in this book.
Spark is a parallel computational engine that is now part of the Hadoop project (even if it doesn't use its code), that can run in local mode or on very large clusters (with thousands of nodes), to execute complex tasks using huge amounts of data. It's mainly based on Scala, though there are interfaces for Java, Python, and R. In this ...