Of course, classification modeling is not restricted to decision trees. Many other classification methods are available, including Naïve Bayes classification. Naïve Bayes classification methods are based on Bayes Theorem, developed by the Reverend Thomas Bayes.1 Bayes Theorem updates our knowledge about the data parameters by combining our previous knowledge (called the prior distribution) with new information obtained from observed data, resulting in updated parameter knowledge (called the posterior distribution).


Consider a data set made up of two predictors X = X1, X2 and a response variable Y, where the response variable takes one of three possible class values: y1, y2, and y3 Our objective is to identify which of y1, y2, and y3 is the most likely for a particular combination of predictor variable values. Let us call this most likely combination X* = {X1 = x1, X2 = x2}.

We can use Bayes Theorem to identify which class is the most likely for a particular combination of predictor variable values by:

  1. calculating the posterior probability for each of y1, y2, and y3, for the combination of predictors x1 and x2 and
  2. selecting the value of y with the highest posterior probability.

Let y* be one of the three potential values of Y. Bayes Theorem tells us:


Now, p(Y = y*) represents the ...

Get Data Science Using Python and R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.