O'Reilly logo

Clojure Data Analysis Cookbook - Second Edition by Eric Rochester

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Performing naïve Bayesian classification with MALLET

MALLET has gotten its reputation as a library for topic modeling. However, it also has a lot of other algorithms in it.

One popular algorithm that MALLET implements is naïve Bayesian classification. If you have documents that are already divided into categories, you can train a classifier to categorize new documents into those same categories. Often, this works surprisingly well.

One common use for this is in spam e-mail detection. We'll use this as our example here too.

Getting ready

We'll need to have MALLET included in our project.clj file:

(defproject com.ericrochester/text-data "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [cc.mallet/mallet "2.0.7"]])

Just as in the Performing ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required