MALLET has gotten its reputation as a library for topic modeling. However, it also has a lot of other algorithms in it.
One popular algorithm that MALLET implements is naïve Bayesian classification. If you have documents that are already divided into categories, you can train a classifier to categorize new documents into those same categories. Often, this works surprisingly well.
One common use for this is in spam e-mail detection. We'll use this as our example here too.
We'll need to have MALLET included in our
(defproject com.ericrochester/text-data "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.6.0"] [cc.mallet/mallet "2.0.7"]])
Just as in the Performing ...