November 2019
Intermediate to advanced
346 pages
9h 36m
English
Start by importing pandas and numpy and creating a variable pointing to the dataset (Step 1). There are several datasets available from CERT. Version 4.2 is distinguished in being a dense needle dataset, meaning that it has a higher incidence of insider threats than the other datasets. Since the dataset is so massive, it is convenient to filter and downsample it, at the very least during the experimentation phases, so we do so in Step 2. In the following steps, we will hand-engineer features that we believe will help our classifier catch insider threats. In Step 3, we create a convenient function to encode features, so that a dictionary can track these. We provide the names of the features we will be adding in Step 4. In ...