Anomalous data refers to data that is unusual from normal distributions. Thus, detecting anomalies is an important task for network security, anomalous packets or requests can be flagged as errors or potential attacks.
In this example, we will use the KDD-99 dataset (can be downloaded here: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html ). A number of columns will be filtered out based on certain criteria of the data points. This will help us understand the example. Secondly, for the unsupervised task; we will have to remove the labeled data. Let's load and parse the dataset as simple texts. Then let's see how many rows there are in the dataset:
INPUT = "C:/Users/rezkar/Downloads/kddcup.data" ...