The DataFrame-based API will be the primary API.
The RDD-based API is entering maintenance mode. The MLlib guide (http://spark.apache.org/docs/2.0.0/ml-guide.html) provides more details.
The following are the new features introduced in Spark 2.0:
- ML persistence: The DataFrames-based API provides support for saving and loading ML models and Pipelines in Scala, Java, Python, and R
- MLlib in R: SparkR offers MLlib APIs for generalized linear models, naive Bayes, k-means clustering, and survival regression in this release
- Python: PySpark in 2.0 supports new MLlib algorithms, LDA, Generalized Linear Regression, Gaussian Mixture Model, among others
Algorithms added to DataFrames-based API are GMM, Bisecting K-Means clustering, ...