First of all, a word is represented as a dense vector. A word embedding can be thought as a mapping function from the word to an n-dimensional space; that is,, in which W is a parameterized function mapping words in some language to high-dimensional vectors (for example, vectors with 200 to 500 dimensions). You may also consider W as a lookup table with the size of V X N, where V is the size of the vocabulary and N is the size of the dimension, and each row corresponds to one word. For example, we might find:
W("dog")=(0.1, -0.5, 0.8, ...)W(‘‘mat")=(0.0, 0.6, -0.1, ...)
Here, W is often initialized to have random vectors ...