How to do it…

In the following steps, you will construct new features for the CERT insider threat dataset:

  1. Import numpy and pandas, and point to where the downloaded data is located:
import numpy as npimport pandas as pdpath_to_dataset = "./r42short/"
  1. Specify the .csv files and which of their columns to read:
log_types = ["device", "email", "file", "logon", "http"]log_fields_list = [    ["date", "user", "activity"],    ["date", "user", "to", "cc", "bcc"],    ["date", "user", "filename"],    ["date", "user", "activity"],    ["date", "user", "url"],]
  1. We will hand-engineer a number of features and encode them, thereby creating a dictionary to track these.
features = 0feature_map = {}def add_feature(name): """Add a feature to a dictionary to be encoded.""" ...

Get Machine Learning for Cybersecurity Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.