December 2018
Beginner to intermediate
684 pages
21h 9m
English
gensim is a specialized NLP library with a fast LDA implementation and many additional features. We will also use it in the next chapter on word vectors (see the latent_dirichlet_allocation_gensim notebook for details).
It facilitates the conversion of DTM produced by sklearn into gensim data structures as follows:
train_corpus = Sparse2Corpus(train_dtm, documents_columns=False)test_corpus = Sparse2Corpus(test_dtm, documents_columns=False)id2word = pd.Series(vectorizer.get_feature_names()).to_dict()
Gensim LDA algorithm includes numerous settings, which are as follows:
LdaModel(corpus=None, num_topics=100, id2word=None, distributed=False, chunksize=2000, # No of doc per training chunk. passes=1, # No ...