Artificial Intelligence for Big Data
by Anand Deshpande, Manish Kumar, Albenzo Coletta, Giancarlo Zaccone
The Spark MLlib library
The Spark MLlib is a library of machine learning algorithms and utilities designed to make machine learning easy and run in parallel. This includes regression, collaborative filtering, classification, and clustering. Spark MLlib provides two types of API included in the packages, namely spark.mllib and spark.ml, where spark.mllib is built on top of RDDs and spark.ml is built on top of the DataFrame. The primary machine learning API for Spark is now the DataFrame-based API in the spark.ml package. Using spark.ml with the DataFrame API is more versatile and flexible, and we can have the benefits provided by DataFrame, such as catalyst optimizer and spark.mllib, which is an RDD-based API that is expected to be removed ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access