Preparing the data

We use the same torchtext for downloading, tokenizing and building vocabulary for the IMDB dataset. When creating the Field object, we leave the batch_first argument at False. RNN networks expect the data to be in the form of Sequence_length, batch_size and features. The following is used for preparing the dataset:

TEXT = data.Field(lower=True,fix_length=200,batch_first=False)LABEL = data.Field(sequential=False,)train, test = IMDB.splits(TEXT, LABEL)TEXT.build_vocab(train, vectors=GloVe(name='6B', dim=300),max_size=10000,min_freq=10)LABEL.build_vocab(train,)

Get Deep Learning with PyTorch now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.