December 2018
Beginner to intermediate
684 pages
21h 9m
English
In the two previous chapters, we applied the bag-of-words model to convert text data into a numerical format. The results were sparse, fixed-length vectors that represent documents in a high-dimensional word space. This allows evaluating the similarity of documents and creates features to train a machine learning algorithm and classify a document's content or rate the sentiment expressed in it. However, these vectors ignore the context in which a term is used so that, for example, a different sentence containing the same words would be encoded by the same vector.
In this chapter, we will introduce an alternative class of algorithms that use neural networks to learn a vector representation of individual semantic units such ...