February 2018
Intermediate to advanced
262 pages
6h 59m
English
We use the same torchtext for downloading, tokenizing and building vocabulary for the IMDB dataset. When creating the Field object, we leave the batch_first argument at False. RNN networks expect the data to be in the form of Sequence_length, batch_size and features. The following is used for preparing the dataset:
TEXT = data.Field(lower=True,fix_length=200,batch_first=False)LABEL = data.Field(sequential=False,)train, test = IMDB.splits(TEXT, LABEL)TEXT.build_vocab(train, vectors=GloVe(name='6B', dim=300),max_size=10000,min_freq=10)LABEL.build_vocab(train,)