Chapter 4

Classification

Abstract

In classification or class prediction, it’s best to try to use the information from the predictors or independent variables to sort a data sample into two or more distinct classes or buckets. Classification is the most widely used data science task in business. There are several ways to build classification models. In this chapter, six of the most commonly used classification algorithms will be discussed and demonstrated: decision trees, rule induction, k-nearest neighbors (k-NNs), naïve Bayesian, artificial neural networks, and support vector machines. This chapter is concluded by building ensemble classification models and a discussion on bagging, boosting, and random forests.

Keywords

Classification; decision ...

Get Data Science, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.