Conducting predictive analytics using Spark MLib

Spark has a very rich machine learning library called MLib. This is a collection of various algorithms that are used for classification, clustering, recommendations, and so on. In this recipe, we are going to take a look at how to build a predictive model using MLib.

Getting ready

To perform this recipe, you should have Hadoop and Spark installed. You also need to install Scala. Here, I am using Scala 2.11.0.

How to do it...

For this recipe, we are going use the classic example dataset of iris flowers; you can find out more about this at https://en.wikipedia.org/wiki/Iris_flower_data_set.

Here, based on the petal length and width and the sepal length and width, we need to classify the flowers into species. ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop: Data Processing and Modelling by Garry Turkington, Tanmay Deshpande, Sandeep Karanth

Conducting predictive analytics using Spark MLib

Getting ready

How to do it...

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly