Evaluating classification models

Now that we have fit a classification model, we can examine the accuracy on the test set. One common tool for performing this kind of analysis is the Receiver Operator Characteristic (ROC) curve. To draw an ROC curve, we select a particular cutoff for the classifier (here, a value between 0 and 1 above which we consider a data point to be classified as a positive, or 1) and ask what fraction of 1s are correctly classified by this cutoff (true positive rate) and, concurrently, what fraction of negatives are incorrectly predicted to be positive (false positive rate) based on this threshold. Mathematically, this is represented by choosing a threshold and computing four values:

TP = true positives = # of class 1 points ...

Get Python: Advanced Predictive Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.