February 2018
Intermediate to advanced
262 pages
6h 59m
English
When we created one-hot encoding for thor_review, we created a word2idx dictionary, which is referred to as the vocabulary since it contains all the details of the unique words in the documents. The torchtext instance makes that easier for us. Once the data is loaded, we can call build_vocab and pass the necessary arguments that will take care of building the vocabulary for the data. The following code shows how the vocabulary is built:
TEXT.build_vocab(train, vectors=GloVe(name='6B', dim=300),max_size=10000,min_freq=10)LABEL.build_vocab(train)
In the preceding code, we pass in the train object on which we need to build the vocabulary, and we also ask it to initialize vectors with pretrained embeddings of dimensions