Machine Learning for Business Analytics, 2nd Edition
by Peter C. Bruce, Mia L. Stephens, Galit Shmueli, Muralidhara Anandamurthy, Nitin R. Patel
5 EVALUATING PREDICTIVE PERFORMANCE
In this chapter, we discuss how the predictive performance of machine learning methods can be assessed. This is a critical step in any analytics project. We point out the danger of overfitting to the training data, and the need for testing model performance on data that were not used in the training step. We discuss popular performance metrics. For prediction, metrics include average absolute error (AAE) and root mean squared error (RMSE) (based on the validation data or test data). For classification tasks, metrics based on the classification matrix include overall accuracy, sensitivity and specificity, and metrics that account for misclassification costs. We also show the relation between the choice of threshold value and method performance, and present the receiver operating characteristic (ROC) curve, which is a popular chart for assessing method performance at different threshold values. When the goal is to accurately classify the most interesting or important cases, called ranking, rather than accurately classify the entire sample (e.g., the 10% of customers most likely to respond to an offer, or the 5% of claims most likely to be fraudulent), lift curves are used to assess performance. We also discuss the need for oversampling rare classes and how to adjust performance metrics for the oversampling. Finally, we mention the usefulness of comparing metrics based on the validation data to metrics based on the training data for the purpose ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access