Text classification

Text classification is about assigning a topic, subject category, genre, or something similar to the text blob. For example, spam filters assign spam or not spam to an email.

Apache Spark supports various classifiers through MLlib and ML packages. The SVM classifier and Naive Bayes classifier are popular classifiers, and the former was already covered in the previous chapter. Let's take a look at the latter now.

Naive Bayes classifier

The Naive Bayes (NB) classifier is a multiclass probabilistic classifier and is one of the best classification algorithms. It assumes strong independence between every pair of features. It computes the conditional probability distribution of each feature and a given label, and then applies Bayes' ...

Get Spark for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.