Chapter 18Graphical Evaluation of Classification Models
18.1 Review of Lift Charts and Gains Charts
In Chapter 15, we learned about lift charts and gains charts. Recall that lift is defined as the proportion of positive hits in the set of the model's positive classifications, divided by the proportion of positive hits in the data set overall:
where a hit is defined as a positive response that was predicted to be positive. To construct a lift chart, the software sorts the records by propensity to respond positively, and then calculates the lift at each percentile. For example, a lift value of 2.0 at the 20th percentile means that the 20% of records that contain the most likely responders have twice as many responders as a similarly sized random sample of records. Gains charts represent the cumulative form of lift charts. For more on lift charts and gains charts, see Chapter 15.
18.2 Lift Charts and Gains Charts Using Misclassification Costs
Lift charts and gains charts may be used in the presence of misclassification costs. This works because the software ranks the records by propensity to respond, and the misclassification costs directly affect the propensity to respond for a given classification model. Recall the Loans data set, where a bank would like to predict loan approval for a training data set of about 150,000 loan applicants, based on the predictors debt-to-income ...
Get Data Mining and Predictive Analytics, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.