O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Spark Integration

MLlib benefits from the components within the Spark ecosystem. Spark core provides an execution engine with over 80 operators for transforming data (data cleaning and featurization).

MLlib uses other high-level libraries packaged with Spark-like Spark SQL. It provides integration data functionality, SQL, and structured data processing, which simplifies data cleaning and preprocessing. It supports the DataFrame abstraction, which is fundamental to the spark.ml package.

GraphX (https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-gonzalez.pdf) supports large-scale graph processing and has a powerful API for implementing learning algorithms that can be viewed as large sparse graph problems, for example, LDA.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required