- 3.1 Hypothesis Testing Versus Exploratory Data Analysis
- 3.2 Getting to Know the Data Set
- 3.3 Exploring Categorical Variables
- 3.4 Exploring Numeric Variables
- 3.5 Exploring Multivariate Relationships
- 3.6 Selecting Interesting Subsets of the Data for Further Investigation
- 3.7 Using EDA to Uncover Anomalous Fields
- 3.8 Binning Based on Predictive Value
- 3.9 Deriving New Variables: Flag Variables
- 3.10 Deriving New Variables: Numerical Variables
- 3.11 Using EDA to Investigate Correlated Predictor Variables
- 3.12 Summary
3.1 Hypothesis Testing Versus Exploratory Data Analysis
When approaching a data mining problem, a data mining analyst may already have some a priori hypotheses that he or she would like to test regarding the relationships between the variables. For example, suppose that cell phone executives are interested in whether a recent increase in the fee structure has led to a decrease in market share. In this case, the analyst would test the hypothesis that market share has decreased, and would therefore use hypothesis testing procedures.
A myriad of statistical hypothesis testing procedures are available through the traditional statistical analysis literature. We cover many of these in Chapters 4 and 5. However, analysts do not always have a priori notions of the expected relationships among the variables. Especially when confronted with unknown, large databases, analysts often prefer to use ...