Chapter 3. Calculating Text Similarity Using Word Embeddings
Before we get started, this is the first chapter with actual code in it. Chances are you skipped straight to here, and who would blame you? To follow the recipes it really helps though if you have the accompanying code up and running. You can easily do this by executing the following commands in a shell:
git clone \ https://github.com/DOsinga/deep_learning_cookbook.git cd deep_learning_cookbook python3 -m venv venv3 source venv3/bin/activate pip install -r requirements.txt jupyter notebook
You can find a more detailed explanation in “What Do You Need to Know?”.
In this chapter we’ll look at word embeddings and how they can help us to calculate the similarities between pieces of text. Word embeddings are a powerful technique used in natural language processing to represent words as vectors in an n-dimensional space. The interesting thing about this space is that words that have similar meanings will appear close to each other.
The main model we’ll use here is a version of Google’s Word2vec. This is not a deep neural model. In fact, it is no more than a big lookup table from word to vector and therefore hardly a model at all. The Word2vec embeddings are produced as a side effect of training a network to predict a word from context for sentences taken from Google News. Moreover, it is possibly the best-known example of an embedding, and embeddings are an important concept in deep learning.
Once you start looking for ...