10Classification Algorithms

A classification is a definition comprising a system of definitions.

Karl Wilhelm Friedrich Schlegel

10.1. Introduction

Linear regression, as we saw in the previous chapter, is used when we want to build a model that predicts a value. But what happens if we want to classify something? In this case, we can use classification algorithms.

These algorithms, which belong to the supervised learning group, appear in applications related to data analysis. These algorithms can be used on structured or unstructured data. They seek to assign class labels to new data-sets or observations based on existing observations.

In other words, these algorithms are a set of techniques and data analysis methods that classify data by identifying the class or category to which new data belongs.

To better explain, let’s look at a simple example. This is the example that we have already mentioned in the previous section. It’s the spam filter that can learn to report spam using other examples of spam that contain unreliable terms such as “money transfer”, or that come from senders not in the recipient’s contact list.

Based on an email’s contents, messaging providers use classification to determine if incoming email messages are spam (Androutsopoulos et al. 2000; Sedkaoui 2018a). Classifications algorithms will, depending on the data-set, which are also called “training sets”, mark spam in incoming emails. They classify emails based on experience, corresponding to the training ...

Get Sharing Economy and Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.