© Ed Elliott 2021
E. ElliottIntroducing .NET for Apache Sparkhttps://doi.org/10.1007/978-1-4842-6992-3_7

7. Spark Machine Learning API

Ed Elliott1  
(1)
Sussex, UK
 

In this chapter, we will look at Spark’s machine learning API or the MLLib API. The MLLib API is made up of both an RDD-based API and the newer DataFrame API. The DataFrame version of the API is referred to as the ML API because the objects exist in the org.apache.spark.ml namespace. From here on, we will use the term ML API to refer to the DataFrame version of the MLLib API. In the same way that the .NET for Apache Spark project supports the DataFrame API and not the RDD API, to date only the Spark ML API has any implementation.

The ML API was not part of the core project when it was first ...

Get Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.