Word embeddings are representations of text that mitigate some of the shortcomings of the bag-of-words model. While the bag-of-words model uses a scalar to represent each token, word embeddings use a vector. The vectors are usually dense and often have between 50 and 500 dimensions. These vectors represent the words in a metric space. Words that are semantically similar to each other are represented by vectors that are near each other. Concretely, word embeddings are parameterized functions that take a token from some language as an input and output a vector. This function is essentially a lookup table that is parameterized by a matrix of embeddings. How is this matrix learned?
The parameters of a word embedding function are ...