O'Reilly logo

Clojure for Data Science by Henry Garner

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Large-scale machine learning with Apache Spark and MLlib

The Spark project (https://spark.apache.org/) is a cluster computing framework that emphasizes low-latency job execution. It's a relatively recent project, growing out of UC Berkley's AMP Lab in 2009.

Although Spark is able to coexist with Hadoop (by connecting to the files stored on Hadoop Distributed File System (HDFS), for example), it targets much faster job execution times by keeping much of the computation in memory. In contrast with Hadoop's two-stage MapReduce paradigm, which stores files on the disk in between each iteration, Spark's in-memory model can perform tens or hundreds of times faster for some applications, particularly those performing multiple iterations over the data. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required