Coding the network intrusion attack

We start with importing the relevant packages that will be used. Since the data is very big, we may choose to use Spark.

Spark is an open source distributed cluster-computing system that is used for handling big data:

import osimport sysimport reimport timefrom pyspark import SparkContextfrom pyspark import SparkContextfrom pyspark.sql import SQLContextfrom pyspark.sql.types import *from pyspark.sql import Row# from pyspark.sql.functions import *%matplotlib inlineimport matplotlib.pyplot as pltimport pandas as pdimport numpy as npimport pyspark.sql.functions as funcimport matplotlib.patches as mpatchesfrom operator import addfrom pyspark.mllib.clustering import KMeans, KMeansModelfrom operator import add ...

