September 2018
Intermediate to advanced
472 pages
12h 2m
English
First of all, let's download and decompress the dataset. We will be very conservative and use just 10% of the original training dataset (75 MB, uncompressed), as all our analysis is run on a small virtual machine. If you want to give it a try, you can uncomment the lines in the following snippet of code and download the full training dataset (750 MB uncompressed). We download the training dataset, testing (47 MB), and feature names, using bash commands:
In: !mkdir datasets !rm -rf ./datasets/kdd* # !wget -q -O datasets/kddtrain.gz \ # http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data.gz !wget -q -O datasets/kddtrain.gz \ http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz !wget -q -O datasets/kddtest.gz ...
Read now
Unlock full access