November 2019
Intermediate to advanced
346 pages
9h 36m
English
We start simply by organizing our data and labels into arrays (Step 1). In particular, we read in our samples and give them the label corresponding to the packer with which they have been packed. In Step 2, we train-test split our data. We are now ready to featurize our data, so we import the requisite libraries for N-gram extraction, as well as define our N-gram functions (Steps 3 and 4), which are discussed in other recipes, and, making a simplifying choice of N=2 and the K1=100 most frequent N-grams as our features, featurize our data (Steps 5 and 6). Different values of N and other methods of selecting the most informative N-grams can yield superior results, while increasing the need for computational resources. Having featurized ...