Word Embeddings and Distance Measurements for Text

In Chapter 4, Transforming Text into Data Structures, we discussed the bag-of-words and term-frequency and inverse document frequency-based methods to represent text in the form of numbers. These methods mostly rely on the syntactical aspects of a word in terms of its presence or absence in a document or across a text corpus. However, information about the neighborhood of the word, in terms of what words come after or before a word, wasn't taken into account in the approaches we have discussed so far. The neighborhood of a word carries important information in terms of what context the word is carrying in a sentence. The relationship between the word and its neighborhood tends to define the ...

Get Hands-On Python Natural Language Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.