July 2017
Intermediate to advanced
796 pages
18h 55m
English
MLlib is a distributed machine learning framework above Spark core and handles machine-learning models used for transforming datasets in the form of RDDs. Spark MLlib is a library of machine-learning algorithms providing various algorithms such as logistic regression, Naive Bayes classification, Support Vector Machines (SVMs), decision trees, random forests, linear regression, Alternating Least Squares (ALS), and k-means clustering. Spark ML integrates very well with Spark core, Spark streaming, Spark SQL, and GraphX to provide a truly integrated platform where data can be real-time or batch.
In addition, PySpark and SparkR are also available ...
Read now
Unlock full access