Visualizing embedding vectors

To obtain better word vectors, compared to the ones in the Training embedding model section, we'll train another word2vec model. However, this time, we will use a larger corpus—the text8 dataset, which consists of the first 100,000,000 bytes of plain text from Wikipedia. The dataset is included in Gensim and it's tokenized as a single long list of words. With that, let's start:

  1. As usual, the imports are first. We'll also set the logging to INFO for good measure:
import loggingimport pprint  # beautify printsimport gensim.downloader as gensim_downloaderimport matplotlib.pyplot as pltimport numpy as npfrom gensim.models.word2vec import Word2Vecfrom sklearn.manifold import TSNElogging.basicConfig(level=logging.INFO) ...

Get Advanced Deep Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.