November 2019
Intermediate to advanced
346 pages
9h 36m
English
We begin by importing standard Python libraries to analyze the files and set up machine learning pipelines (Step 1). In Steps 2 and 3, we collect the non-obfuscated and obfuscated JavaScript files into arrays and assign them their respective labels. This is preparation for our binary classification problem. Note that the main challenge in producing this classifier is producing a large and useful dataset. Ideas for solving this hurdle include collecting a large number of JavaScript samples and then using different tools to obfuscate these. Consequently, your classifier will likely be able to avoid overfitting to one type of obfuscation. Having collected the data, we separate it into training and testing subsets (Step 4). In addition, ...