3

Naïve Bayes and the Incredible Lightness of Being an Idiot

In the previous chapter, you hit the ground running with a bit of unsupervised learning. You looked at k-means clustering, which is like the chicken nugget of the data mining world: simple, intuitive, and useful. Delicious too.

In this chapter you're going to move from unsupervised into supervised artificial intelligence models by training up a naïve Bayes model, which is, for lack of a better metaphor, also a chicken nugget, albeit a supervised one.

As mentioned in Chapter 2, in supervised artificial intelligence, you “train” a model to make predictions using data that's already been classified. The most common use of naïve Bayes is for document classification. Is this e-mail spam or ham? Is this tweet happy or angry? Should this intercepted satellite phone call be classified for further investigation by the spooks? You provide “training data,” i.e. classified examples, of these documents to the training algorithm, and then going forward, the model can classify new documents into these categories using its knowledge.

The example you'll work through in this chapter is one that's close to my own heart. Let me explain.

When You Name a Product Mandrill, You're Going to Get Some Signal and Some Noise

Recently the company I work for, MailChimp, started a new product called Mandrill.com. It has the most frightening logo I've seen in a while (see Figure 3-1).

Mandrill is a transactional e-mail product for software developers ...

Get Data Smart: Using Data Science to Transform Information into Insight now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.