O'Reilly logo

TensorFlow Machine Learning Cookbook by Nick McClure

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Implementing TF-IDF

Since we can choose the embedding for each word, we might decide to change the weighting on certain words. One such strategy is to upweight useful words and downweight overly common or too rare words. The embedding we will explore in this recipe is an attempt to achieve this.

Getting ready

TF-IDF is an acronym that stands for Text Frequency – Inverse Document Frequency. This term is essentially the product of text frequency and inverse document frequency for each word.

In the prior recipe, we introduced the bag of words methodology, which assigned a value of one for every occurrence of a word in a sentence. This is probably not ideal as each category of sentence (spam and ham for the prior recipe example) most likely has the same ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required