April 2018
Beginner to intermediate
282 pages
6h 52m
English
Let's run this pipeline on another dataset that is popular in the ML community. KDDCUP 99 dataset is tcpdump portions of the 1998 DARPA Intrusion Detection System Evaluation dataset and goal is to detect network intrusions. It includes numerical features hence it will be easier to set-up our AutoML pipeline:
# You can import this dataset directly from sklearnfrom sklearn.datasets import fetch_kddcup99# Downloading subset of whole datasetdataset = fetch_kddcup99(subset='http', shuffle=True, percent10=True)# Downloading https://ndownloader.figshare.com/files/5976042# [INFO] [17:43:19:sklearn.datasets.kddcup99] Downloading https://ndownloader.figshare.com/files/5976042 ...