Chapter 8. ML: classification and clustering

This chapter covers

The Spark ML library
Logistic regression
Decision trees and random forests
K-means clustering

In the previous chapter, you got acquainted with Spark MLlib (Spark’s machine learning library), with machine learning in general, and linear regression, the most important method of regression analysis. In this chapter, we’ll cover two equally important fields in machine learning: classification and clustering.

Classification is a subset of supervised machine learning algorithms, where the target variable is a categorical variable, which means it takes only a limited set of values. So the task of classification is to categorize input examples into several classes. Recognizing ...

Get Spark in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Spark in Action by Petar Zecevic, Marko Bonaci

Chapter 8. ML: classification and clustering

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly