O'Reilly logo

Apache Spark for Data Science Cookbook by Padma Priya Chitturi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Working with Spark ML pipelines

Spark MLlib's goal is to make practical ML scalable and easy. Similar to Spark Core, MLlib provides APIs in three languages that is, Python, Scala, and Java-with example code which will ease the learning curve for users coming from different backgrounds. The pipeline API in MLlib provides a uniform set of high-level APIs built on top of DataFrames that helps users create and tune practical ML pipelines. This API is under a new package with name spark.ml.

MLlib standardizes APIs for machine learning algorithms to make it easier to combine multiple algorithms into a single pipeline or workflow. Let's see the key terms introduced by the pipeline API:

  • DataFrame: The ML API uses DataFrame from Spark SQL as an ML dataset, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required