O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Spark machine learning APIs - ML pipelines and MLlib

Until around 1.6.0, the north-facing data abstraction method was RDD, and the MLlib APIs implemented machine learning on RDDs. MLlib was introduced in Spark 0.8 and, for the most part, were straightforward library calls to ML algorithms; however, this didn't reflect the data pipelines inherent in machine learning. With the advent of DataFrames and Datasets, MLlib transformed as well with more capabilities, and the resulting framework is the ML pipeline.

Tip

MLlib APIs are in maintenance mode from 2.0.0 and will be deprecated in 3.0.0. But be aware that there are still some APIs that are not migrated to the ML world; for example, the random generator still outputs an RDD. So you will have to use ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required