December 2018
Beginner to intermediate
684 pages
21h 9m
English
We will illustrate the application of LSI using the BBC article data that we introduced in the last chapter because it is small enough to permit quick training and allow us to compare topic assignments to category labels. See the latent_semantic_indexing notebook for additional implementation details:
vectorizer = TfidfVectorizer(max_df=.25, min_df=.01,stop_words='english', ...