In this chapter, we will look at Spark’s machine learning API or the MLLib API. The MLLib API is made up of both an RDD-based API and the newer DataFrame API. The DataFrame version of the API is referred to as the ML API because the objects exist in the org.apache.spark.ml namespace. From here on, we will use the term ML API to refer to the DataFrame version of the MLLib API. In the same way that the .NET for Apache Spark project supports the DataFrame API and not the RDD API, to date only the Spark ML API has any implementation.
The ML API was not part of the core project when it was first ...