Conducting predictive analytics using Spark MLib
Spark has a very rich machine learning library called MLib
. This is a collection of various algorithms that are used for classification, clustering, recommendations, and so on. In this recipe, we are going to take a look at how to build a predictive model using MLib
.
Getting ready
To perform this recipe, you should have Hadoop and Spark installed. You also need to install Scala. Here, I am using Scala 2.11.0.
How to do it...
For this recipe, we are going use the classic example dataset of iris flowers; you can find out more about this at https://en.wikipedia.org/wiki/Iris_flower_data_set.
Here, based on the petal length and width and the sepal length and width, we need to classify the flowers into species. ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.