August 2018
Intermediate to advanced
378 pages
9h 9m
English
This section builds on the earlier binary classification task and looks to increase the accuracy for that task. The first thing we can do to improve the model is to use more data, 100 times more data in fact! We will download the entire dataset, which is over 4 GB data in zip files and 40 GB of data when the files are unzipped. Go back to the download link (https://www.dunnhumby.com/sourcefiles) and select Let’s Get Sort-of-Real again and download all the files for the Full dataset. There are nine files to download and the CSV files should be unzipped into the dunnhumby/in folder. Remember to check that the CSV files are in this folder and not a subfolder. You need to run the code in Chapter4/prepare_data.R ...