How it works…
We start simply by organizing our data and labels into arrays (Step 1). In particular, we read in our samples and give them the label corresponding to the packer with which they have been packed. In Step 2, we train-test split our data. We are now ready to featurize our data, so we import the requisite libraries for N-gram extraction, as well as define our N-gram functions (Steps 3 and 4), which are discussed in other recipes, and, making a simplifying choice of N=2 and the K1=100 most frequent N-grams as our features, featurize our data (Steps 5 and 6). Different values of N and other methods of selecting the most informative N-grams can yield superior results, while increasing the need for computational resources. Having featurized ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access