Skip to Main Content
Natural Language Processing in Action
book

Natural Language Processing in Action

by Cole Howard, Hobson Lane, Hannes Hapke
April 2019
Intermediate to advanced content levelIntermediate to advanced
544 pages
17h 29m
English
Manning Publications
Content preview from Natural Language Processing in Action

3 Math with words (TF-IDF vectors)

This chapter covers

  • Counting words and term frequencies to analyze meaning
  • Predicting word occurrence probabilities with Zipf’s Law
  • Vector representation of words and how to start using them
  • Finding relevant documents from a corpus using inverse document frequencies
  • Estimating the similarity of pairs of documents with cosine similarity and Okapi BM25

Having collected and counted words (tokens), and bucketed them into stems or lemmas, it’s time to do something interesting with them. Detecting words is useful for simple tasks, like getting statistics about word usage or doing keyword search. But you’d like to know which words are more important to a particular document and across the corpus as a whole. Then ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Natural Language Processing with PyTorch

Natural Language Processing with PyTorch

Delip Rao, Brian McMahan

Publisher Resources

ISBN: 9781617294631Supplemental ContentPublisher SupportPublisher WebsiteSupplemental ContentOtherPurchase Link