Chapters 29–32 present a Case Study of Predicting Response to Direct-Mail Marketing. In Chapter 29, we opened our Case Study with a look at the primary and secondary objectives of the project, which are reprised here.
The EDA performed in Chapter 29 allowed us to learn some interesting customer behaviors. Here in this chapter, we learn more about our customers through the use of principal components analysis (PCA) and clustering analysis. In Chapter 31, we tackle our primary objective of developing a profitable classification model.
The analyses we perform in Chapters 29–31 require cross-validation. We therefore partition the data set into a Case Study Training Data Set and a Case Study Test Data Set. The data miner decides the proportional size of the training and test sets, with typical sizes usually ranging from 50% training/50% test to 90% training/10% test. In this Case Study, we choose a partition of approximately 75% training and 25% test.
In Chapter 6, we discussed methods for validating that our partition of the data set is random, using some simple tests of hypothesis. However, such ...