June 2018
Beginner to intermediate
306 pages
7h 42m
English
The bag-of-words model is arguably the most straightforward form of representing a sentence as a vector. Let's start with an example:
S1:"The dog sat by the mat."S2:"The cat loves the dog."
If we follow the same preprocessing steps we did in the Basic Preprocessing with language models section, from Chapter 3, spaCy's Language Models, we will end up with the following sentences:
S1:"dog sat mat."S2:"cat love dog."
As Python lists, these will now look like this:
S1:['dog', 'sat', 'mat']S2:['cat', 'love', 'dog']
If we want to represent this as a vector, we would need to first construct our vocabulary, which would be the unique words found in the sentences. Our vocabulary vector is now as follows:
Vocab = ['dog', 'sat', 'mat', 'love', ...
Read now
Unlock full access