November 2019
Intermediate to advanced
346 pages
9h 36m
English
In the following steps, we will utilize isolation forest to detect anomalies in the KDD dataset:
import pandas as pdkdd_df = pd.read_csv("kddcup_dataset.csv", index_col=None)
y = kdd_df["label"].valuesfrom collections import CounterCounter(y).most_common()
The following output will be observed:
[('normal', 39247),('back', 1098),('apache2', 794),('neptune', 93),('phf', 2),('portsweep', 2),('saint', 1)]
def label_anomalous(text): """Binarize target labels into normal or anomalous.""" if text == "normal": return 0 else: return 1kdd_df["label"] = kdd_df["label"].apply(label_anomalous) ...