June 2020
Intermediate to advanced
382 pages
11h 39m
English
Good word embeddings exhibit the following four properties:
They are dense: In fact, embeddings are essentially factor models. As such, each component of the embedding vector represents a quantity of a (latent) feature. We typically do not know what that feature represents; however, we will have very few—if any—zeros that will cause a sparse input.
They are low dimensional: An embedding has a predefined dimensionality (chosen as a hyperparameter). We saw earlier that in the BoW representation we needed |V| inputs for each word, so that the total size of the input was |V| * n where n is the number of words we use as input. With word embeddings, our input size will be d * n, where d is typically between 50 and ...