O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Training a bag-of-words classifier

In the previous section, we utilized simple binary features for the words in the reviews in order to learn positive and negative sentiments. A better approach would be to use latent features, such as the frequency of the words used in the text. Compared to a binary representation of the presence or absence of words, the count of the words may better capture the characteristics of the text or document. Bag-of-words is a vector representation of text. Each of the vector dimensions captures either the frequency, presence or absence, or weighted values of words in the text. A bag-of-words representation does not capture the order of the words.

The binary feature extraction that was discussed in the previous ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required