September 2015
Beginner to intermediate
454 pages
10h 49m
English
We remember from Chapter 4, Building Good Training Sets – Data Preprocessing, that we have to convert categorical data, such as text or words, into a numerical form before we can pass it on to a machine learning algorithm. In this section, we will introduce the bag-of-words model that allows us to represent text as numerical feature vectors. The idea behind the bag-of-words model is quite simple and can be summarized as follows:
Since the unique words in each document represent only a small ...
Read now
Unlock full access