November 2019
Intermediate to advanced
346 pages
9h 36m
English
We start by loading in a predefined dataset (step 1) using the scipy.sparse.load_npz loading function to load previously saved sparse matrices. Our next step is to train a basic Decision Tree model on our data (step 2). To measure performance, we utilize the balanced accuracy score, a measure that is often used in classification problems with imbalanced datasets. By definition, balanced accuracy is the average of recall obtained on each class. The best value is 1, whereas the worst value is 0.
In the following steps, we employ different techniques to tackle the class imbalance. Our first approach is to utilize class weights to adjust our Decision Tree to an imbalanced dataset (step 3). The balanced mode uses the values of ...