O'Reilly logo

Spark in Action by Petar Zečević Marko Bonaći

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 8. ML: classification and clustering

This chapter covers

  • The Spark ML library
  • Logistic regression
  • Decision trees and random forests
  • K-means clustering

In the previous chapter, you got acquainted with Spark MLlib (Spark’s machine learning library), with machine learning in general, and linear regression, the most important method of regression analysis. In this chapter, we’ll cover two equally important fields in machine learning: classification and clustering.

Classification is a subset of supervised machine learning algorithms, where the target variable is a categorical variable, which means it takes only a limited set of values. So the task of classification is to categorize input examples into several classes. Recognizing ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required