10

BAYESIAN METHODS

You may have opened your email to find an inbox filled with offers for cheap Viagra, discount software, and alluring (?) new companions. And the problem used to be far worse. Many email providers have made a huge dent using Bayesian spam filtering.

Since certain words are more likely to occur in spam mail than legitimate mail, spam filters assign messages containing these words a higher probability of being spam. Certain senders get lower or higher probabilities. The Bayesian filtering combines information from all the words and other characteristics of the email to assign a probability that each message is spam. If that probability exceeds a certain threshold, the message may be sent directly to the trash.

More generally, Bayes offers a mechanism for combining information from multiple sources. We do this all the time in our real lives (though we may not do it well). What is the probability that our team will win the next game? We combine what we know about the strengths of the two teams, any injuries, where the game is played, how hot key players have been, and so on to come up with an estimate.

In statistics, Bayes offers a way to combine prior information with information provided by the data. The Bayesian approach contrasts to the frequentist approach. The confidence intervals and significance tests we have done up to this point are frequentist techniques, based on what would happen under repeated sampling from the population.

Bayesian answers are often ...

Get Mathematical Statistics with Resampling and R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.