Now, to prepare the dataset for training and testing, first, we have to download three files, which are outlined as follows:
- A Google-trained Word2Vec model
- A large Movie Review dataset
- A sentiment labeled dataset
The pre-trained Word2Vec is downloaded from https://code.google.com/p/word2vec/ and then we can set the location for the Google News vectors manually:
public static final String WORD_VECTORS_PATH = "/Downloads/GoogleNews-vectors-negative300.bin.gz";
Then, we will download and extract the Large Movie Review dataset from the following URL.
public static final String DATA_URL = "http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz";
Now, let's set the location to ...