Skip to Content
Data Science Bookcamp
book

Data Science Bookcamp

by Leonard Apeltsin
November 2021
Beginner to intermediate
704 pages
20h 16m
English
Manning Publications
Content preview from Data Science Bookcamp

15 NLP analysis of large text datasets

This section covers

  • Vectorizing texts using scikit-learn
  • Dimensionally reducing vectorized text data
  • Clustering large text datasets
  • Visualizing text clusters
  • Concurrently displaying multiple visualizations

Our previous discussions of natural language processing (NLP) techniques focused on toy examples and small datasets. In this section, we execute NLP on large collections of real-world texts. This type of analysis is seemingly straightforward, given the techniques presented thus far. For example, suppose we’re doing market research across multiple online discussion forums. Each forum is composed of hundreds of users who discuss a specific topic, such as politics, fashion, technology, or cars. We want ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Introducing Data Science

Introducing Data Science

Arno Meysman, Davy Cielen, Mohamed Ali
Learning Data Science

Learning Data Science

Sam Lau, Joseph Gonzalez, Deborah Nolan

Publisher Resources

ISBN: 9781617296253Publisher SupportOtherPublisher WebsiteSupplemental ContentErrata PagePurchase Link