Machine Learning: End-to-End guide for Java developers
by Richard M. Reese, Jennifer L. Reese, Boštjan Kaluža, Dr. Uday Kamath, Krishna Choppella
Summary
In this chapter, we discussed how text mining is different from traditional attribute-based learning, requiring a lot of pre-processing steps in order to transform written natural language into feature vectors. Further, we discussed how to leverage Mallet, a Java-based library for natural language processing by applying it to two real life problems. First, we modeled topics in news corpus using the LDA model to build a model that is able to assign a topic to new document. We also discussed how to build a naive Bayesian spam-filtering classifier using the bag-of-words representation.
This chapter concludes the technical demonstrations of how to apply various libraries to solve machine learning tasks. As we were not able to cover more interesting ...