Word embedding
Bag of Word models have a few less than ideal properties that are worth noting.
The first problem with the Bag of Word models we've previously looked at is that they don't consider the context of the word. They don't really consider the relationships that exist between the words in the document.
A second but related concern is that the assignment of words in the vector space is somewhat arbitrary. Information that might exist about the relation between two words in a corpus vocabulary might not be captured. For example, a model that has learned to process the word alligator can leverage very little of that learning when it comes across the word crocodile, even though both alligators and crocodiles are somewhat similar creatures ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access