Chapter 8Machine Learning Classification

Machine learning classifiers are a critically important part of the data science toolkit. However, they are not nearly as important as they are made out to be. A large part of the mystique of data science comes from the idea that we can pour data into a magical black box that (through some mathematical voodoo that only data scientists are smart enough to understand) can learn everything about the data and solve business problems.

The reality is a lot more mundane. As we've discussed previously, it takes a lot of work to get the data into a form where it can be fed into the black box, a lot of savvy to point the black box at the right question, and additional work to make sense of the results. The machine learning black box itself is usually just a library that you call. Sure, it's good to have some idea of how the classifiers work under the hood – you can pick better ones to use, avoid common pitfalls, make better sense of their output, and understand how to jury-rig them as need be. But training a plain-vanilla classifier is often construed as being rocket science, and it's not.

This chapter comes in two sections. After some initial notes, the first will be a series of rapid-fire tutorials about some of the most useful classifiers. The second section will discuss the various ways that we can grade their accuracy.

8.1 What Is a Classifier, and What Can You Do with It?

A machine learning classifier is a computational object that has two ...

Get The Data Science Handbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.