Chapter 2 Sampling and Data Pre-Processing

2.1 Introduction

2.2 Sampling and Variable Selection

2.2.1 Sampling

2.2.2 Variable Selection

2.3 Missing Values and Outlier Treatment

2.3.1 Missing Values

2.3.2 Outlier Detection

2.4 Data Segmentation

2.4.1 Decision Trees for Segmentation

2.4.2 K-Means Clustering

2.5 Chapter Summary

2.6 References and Further Reading

2.1 Introduction

Data is the key to unlock the creation of robust and accurate models that will provide financial institutions with valuable insight to fully understand the risks they face. However, data is often inadequate on its own and needs to be cleaned, polished, and molded into a much richer form. In order to achieve this, sampling and data pre-processing techniques can be applied ...

Get Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.