December 2018
Beginner to intermediate
684 pages
21h 9m
English
We've downloaded and unzipped the GloVe data to the location indicated in the code and will now create a dictionary that maps GloVe tokens to 100-dimensional real-valued vectors, as follows:
glove_path = Path('data/glove/glove.6B.100d.txt')embeddings_index = dict()for line in glove_path.open(encoding='latin1'): values = line.split() word = values[0] coefs = np.asarray(values[1:], dtype='float32') embeddings_index[word] = coefs
There are around 340,000 word vectors that we use to create an embedding matrix that matches the vocabulary so that the RNN model can access embeddings by the token index:
embedding_matrix = np.zeros((vocab_size, 100))for word, i in t.word_index.items(): embedding_vector = embeddings_index.get(word) ...