November 2019
Intermediate to advanced
346 pages
9h 36m
English
There are several notable new ideas in this section. We start by enumerating our samples and assigning them their respective labels (step 1). Because our dataset is imbalanced, it makes sense to use a stratified train-test split (step 2). In a stratified train-test split, a train-test split is created in which the proportion of each class is the same in the training set, testing set, and original set. This ensures that there is no possibility that our training set, for example, will consist of only one class due to a chance event. Next, we load the functions we will be using to featurize our samples. We employ our feature extraction techniques, as in previous recipes, to compute the best N-gram features (step 4) and then iterate ...