Chapter 30Case Study, Part 2: Clustering and Principal Components Analysis

Chapters 29–32 present a Case Study of Predicting Response to Direct-Mail Marketing. In Chapter 29, we opened our Case Study with a look at the primary and secondary objectives of the project, which are reprised here.

  • Primary objective: Develop a classification model that will maximize profits for direct-mail marketing.
  • Secondary objective: Develop better understanding of our clientele through exploratory data analysis (EDA), component profiles, and cluster profiles.

The EDA performed in Chapter 29 allowed us to learn some interesting customer behaviors. Here in this chapter, we learn more about our customers through the use of principal components analysis (PCA) and clustering analysis. In Chapter 31, we tackle our primary objective of developing a profitable classification model.

30.1 Partitioning the Data

The analyses we perform in Chapters 2931 require cross-validation. We therefore partition the data set into a Case Study Training Data Set and a Case Study Test Data Set. The data miner decides the proportional size of the training and test sets, with typical sizes usually ranging from 50% training/50% test to 90% training/10% test. In this Case Study, we choose a partition of approximately 75% training and 25% test.

30.1.1 Validating the Partition

In Chapter 6, we discussed methods for validating that our partition of the data set is random, using some simple tests of hypothesis. However, such ...

Get Data Mining and Predictive Analytics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.