O'Reilly logo

Python Machine Learning By Example - Second Edition by Yuxi Liu

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Counting the occurrence of each word token

It seems that we are only interested in the occurrence of certain words, their count, or a related measure and not in the order of the words. We can therefore view a text as a collection of words. This is called the Bag of Words (BoW) model. This is a very basic model, but it works pretty well in practice. We can optionally define a more complex model that takes into account the order of words and PoS tags. However, such a model is going to be more computationally expensive and more difficult to program. In reality, the basic BoW model in most cases suffices. Have a doubt? We can give it a shot and see whether the BoW model makes sense.

We start with converting documents into a matrix where each ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required