CHAPTER 8The Naive Bayes Classifier

In this chapter, we introduce the naive Bayes classifier, which can be applied to data with categorical predictors. We review the concept of conditional probabilities, then present the complete, or exact, Bayesian classifier. We next see how it is impractical in most cases, and learn how to modify it and use instead the naive Bayes classifier, which is more generally applicable.

Python

In this chapter, we will use pandas for data handling, scikit-learn for naive Bayes models, and matplotlib for visualization. We will also make use of the utility functions from the Python Utilities Functions Appendix.

 import required functionality for this chapter

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
import matplotlib.pylab as plt
from dmba import classificationSummary, gainsChart

8.1 Introduction

The naive Bayes method (and, indeed, an entire branch of statistics) is named after the Reverend Thomas Bayes (1702–1761). To understand the naive Bayes classifier, we first look at the complete, or exact, Bayesian classifier. The basic principle is simple. For each record to be classified:

  1. Find all the other records with the same predictor profile (i.e., where the predictor values are the same).
  2. Determine what classes the records belong to and which class is most prevalent. ...

Get Data Mining for Business Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.