August 2018
Intermediate to advanced
378 pages
9h 9m
English
We will use the Reuters dataset, which can be accessed through a function in the Keras library. This dataset has 11,228 records with 46 categories. To see more information about this dataset, run the following code:
library(keras)?dataset_reuters
Although the Reuters dataset can be accessed from Keras, it is not in a format that can be used by other machine learning algorithms. Instead of the actual words, the text data is a list of word indices. We will write a short script (Chapter7/create_reuters_data.R) that downloads the data and the lookup index file and creates a data frame of the y variable and the text string. We will then save the train and test data into two separate files. Here is the first part of the code ...