Naïve Bayes and text mining

The extraction of the most relevant features to build a model relies on discovery and data mining. For many applications, the data available to the scientist is unstructured text. The multinomial Naïve Bayes classifier is particularly suited for text mining.

The Naïve Bayes formula is quite effective to classify the following entities:

  • E-mail spams
  • Business news stories
  • Movie reviews
  • Technical papers per field of expertise

This third use case consists of predicting the direction of a stock given the financial news. There are two types of news that affects the stock of a particular company:

  • Macro trends: This consists of the economic or social news such as conflicts, economic trends, or labor market statistics
  • Micro updates ...

Get Scala for Machine Learning - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.