Spark on the KDD99 dataset

Let's conduct this exploration using a real-world dataset: the KDD99 dataset. The goal of the competition was to create a network-intrusion-detection system that is able to recognize which network flow is malicious and which is not. Moreover, many different attacks are in the dataset; the goal is to accurately predict them using the features of the flow of packets contained in the dataset.

As a side note on the dataset, it has been extremely useful for developing great solutions for intrusion-detection systems (IDS) in the first few years after its release. Nowadays, as an outcome of this, all the attacks included in the dataset are very easy to detect, and so it's not used in IDS development anymore. The features ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.