How to do it...

In this recipe, you will build a classifier to determine which packer was used to pack a file:

  1.  Read in the names of the files to be analyzed along with their labels, corresponding to the packer used:
import osfrom os import listdirdirectories_with_labels = [    ("Benign PE Samples", 0),    ("Benign PE Samples UPX", 1),    ("Benign PE Samples Amber", 2),]list_of_samples = []labels = []for dataset_path, label in directories_with_labels:    samples = [f for f in listdir(dataset_path)]    for file in samples:        file_path = os.path.join(dataset_path, file)        list_of_samples.append(file_path)        labels.append(label)
  1. Create a train-test split:
from sklearn.model_selection import train_test_splitsamples_train, samples_test, labels_train, labels_test ...

Get Machine Learning for Cybersecurity Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.