November 2019
Intermediate to advanced
346 pages
9h 36m
English
We start by importing our dataset of PE header information from a collection of samples (step 1). This dataset consists of two classes of PE files: malware and benign. We then use plotly to create a nice-looking interactive 3D graph (step 1). We proceed to prepare our dataset for machine learning. Specifically, in step 2, we set X as the features and y as the classes of the dataset. Based on the fact that there are two classes, we aim to cluster the data into two groups that will match the sample classification. We utilize the K-means algorithm (step 3), about which you can find more information at: https://en.wikipedia.org/wiki/K-means_clustering. With a thoroughly trained clustering algorithm, we are ready to predict on ...