Spark on the KDD99 dataset

Let's conduct this exploration using a real-world dataset: the KDD99 dataset. The goal of the competition was to create a network-intrusion-detection system that is able to recognize which network flow is malicious and which is not. Moreover, many different attacks are in the dataset; the goal is to accurately predict them using the features of the flow of packets contained in the dataset.

As a side note on the dataset, it has been extremely useful for developing great solutions for intrusion-detection systems (IDS) in the first few years after its release. Nowadays, as an outcome of this, all the attacks included in the dataset are very easy to detect, and so it's not used in IDS development anymore. The features ...

Get Python Data Science Essentials - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.