Chapter 5Multivariate Statistics

  1. 5.1 Two-Sample t-Test for Difference in Means
  2. 5.2 Two-Sample Z-Test for Difference in Proportions
  3. 5.3 Test for Homogeneity of Proportions
  4. 5.4 Chi-Square Test for Goodness of Fit of Multinomial Data
  5. 5.5 Analysis of Variance
  6. 5.6 Regression Analysis
  7. 5.7 Hypothesis Testing in Regression
  8. 5.8 Measuring the Quality of a Regression Model
  9. 5.9 Dangers of Extrapolation
  10. 5.10 Confidence Intervals for the Mean Value of y Given x
  11. 5.11 Prediction Intervals for a Randomly Chosen Value of y Given x
  12. 5.12 Multiple Regression
  13. 5.13 Verifying Model Assumptions
    1. The R Zone
    2. Reference
    3. Exercises
    4. Hands-On Analysis

So far we have discussed inference methods for one variable at a time. Data analysts are also interested in multivariate inferential methods, where the relationships between two variables, or between one target variable and a set of predictor variables, are analyzed.

We begin with bivariate analysis, where we have two independent samples and wish to test for significant differences in the means or proportions of the two samples. When would data miners be interested in using bivariate analysis? In Chapter 6, we illustrate how the data is partitioned into a training data set and a test data set for cross-validation purposes. Data miners can use the hypothesis tests shown here to determine whether significant differences exist between the means of various variables in the training and test data sets. If such differences exist, then the cross-validation is invalid, ...

Get Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.