Natural Language Processing and Computational Linguistics
by Brian Sacash, Bhargav Srinivasa-Desikan, Reddy Anil Kumar
Varembed
Varembed is the 4th-word embedding method we will discuss, and like FastText, it takes advantage of morphological information to generate word vectors. The original paper describing the method is titled Morphological Priors for Probabilistic Neural Word Embeddings, and can be found on arxiv [41].
Similar to our GloVe vectors, we cannot update our model with new words and would need to train a new model. Information on training our own models can be found on the original [42] containing the code.
Gensim comes with Varembed word embeddings trained on the Lee dataset, so we will take advantage of this to illustrate setting up a model. You can find the documentation for Varembed [43]. Here, Varembed is a variable that holds the path ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access