In this recipe, you will build a classifier to determine which packer was used to pack a file:
- Read in the names of the files to be analyzed along with their labels, corresponding to the packer used:
import osfrom os import listdirdirectories_with_labels = [ ("Benign PE Samples", 0), ("Benign PE Samples UPX", 1), ("Benign PE Samples Amber", 2),]list_of_samples = []labels = []for dataset_path, label in directories_with_labels: samples = [f for f in listdir(dataset_path)] for file in samples: file_path = os.path.join(dataset_path, file) list_of_samples.append(file_path) labels.append(label)
- Create a train-test split:
from sklearn.model_selection import train_test_splitsamples_train, samples_test, labels_train, labels_test ...