Chapter 16Cost-Benefit Analysis Using Data-Driven Costs
In Chapter 15, we were introduced to cost-benefit analysis and misclassification costs. Our goal in this chapter is to derive a methodology whereby the data itself teaches us what the misclassification costs should be; that is, cost-benefit analysis using data-driven misclassification costs. Before we can perform that, however, we must turn to a more systematic treatment of misclassification costs and cost-benefit tables, deriving the following three important results regarding misclassification costs and cost-benefit tables1
:
- Decision invariance under row adjustment
- Positive classification criterion
- Decision invariance under scaling.
16.1 Decision Invariance Under Row Adjustment
For a binary classifier, define to be the confidence (to be defined later) of the model for classifying a data record as i = 0 or i = 1. For example, represents the confidence that a given classification algorithm has in classifying a record as positive (1), given the data. is also called the posterior probability of a given classification. By way of contrast, would represent the prior probability of a given classification; that is, the proportion ...
Get Data Mining and Predictive Analytics, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.