Chapter 4EXPLORATORY DATA ANALYSIS

4.1 EDA VERSUS HT

Clients or analysts often have a priori hypotheses that they would like the data to test. An example of such a hypothesis is: Do cellphone users have a higher rate of positive responses than landline users? The resulting hypothesis test (HT) could be carried out using either classical statistical methods or using the cross‐validation methods of data science (Chapter 5).

On the other hand, the client or the analyst may not have any salient a priori notions about what the data might uncover. In such cases, they would prefer to use exploratory data analysis (EDA) or graphical data analysis. EDA allows the user to:

  • Use graphics to explore the relationship between the predictor variables and the target variable.
  • Use graphics and tables to derive new variables that will increase predictive value.
  • Use binning productively, to increase predictive value.

In this chapter, we will continue to explore the bank_marketing_training data set from Chapter 3. We begin by using graphics to investigate the relationship between the target response and a categorical predictor.

4.2 BAR GRAPHS WITH RESPONSE OVERLAY

We can use bar graphs with a response overlay for exploring the relationship between a categorical predictor and the target variable. Figure 4.1 shows a bar graph of previous_outcome with an overlay of the target response. Previous_outcome refers to the result of a previous marketing campaign with this same customer, with most customers ...

Get Data Science Using Python and R now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.