November 2019
Intermediate to advanced
346 pages
9h 36m
English
In the following steps, we curate a dataset and then use it to create a classifier to determine the file type. For demonstration purposes, we show how to obtain a collection of PowerShell scripts, Python scripts, and JavaScript files by scraping GitHub. A collection of samples obtained in this way can be found in the accompanying repository as PowerShellSamples.7z, PythonSamples.7z, and JavascriptSamples.7z. First, we will write the code for the JavaScript scraper:
import osfrom github import Githubimport base64