Gensim is famous for its powerful semantic and topic modeling algorithms. Topic modeling is a typical text mining task of discovering the hidden semantic structures in a document. Semantic structure in plain English is the distribution of word occurrences. It is obviously an unsupervised learning task. What we need to do is to feed in plain text and let the model figure out the abstract "topics". We will study topic modeling in detail in Chapter 3, Mining the 20 Newsgroups Dataset with Clustering and Topic Modeling Algorithms.
In addition to robust semantic modeling methods, gensim also provides the following functionalities:
- Word embedding: Also known as word vectorization, this is an innovative way to represent ...