Topic Modeling - A Better Insight into Large-Scale Texts

Topic modeling (TM) is a technique widely used in mining text from a large collection of documents. These topics can then be used to summarize and organize documents that include the topic terms and their relative weights. The dataset that will be used for this project is just in plain unstructured text format.

We will see how effectively we can use the Latent Dirichlet Allocation (LDA) algorithm for finding useful patterns in the data. We will compare other TM algorithms and the scalability power of LDA. In addition, we will utilize Natural Language Processing (NLP) libraries, such as Stanford NLP.

In a nutshell, we will learn the following topics throughout this end-to-end project: ...

Get Scala Machine Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.