5

RELATIONSHIP BETWEEN TWO CATEGORICAL VARIABLES

In this chapter, we look at two-way tables, also called 2 × 2 tables, in which rows and columns represent binary values of two different variables. 2 × 2 tables are a subset of r × c tables (short for row × column), where the row and columns represent more than two values of their variables. After completing this chapter, you should be able to

  • build and interpret 2 × 2 tables,
  • specify how to do a resampling test for a difference between two proportions,
  • perform probability calculations involving conditional probabilities,
  • perform basic Bayesian calculations
  • define and test for statistical independence

5.1 TWO-WAY TABLES

We now return to the data previously mentioned on admission to graduate schools. The data are for the six largest academic departments, and the issue under consideration was admission rates for men and women. We begin with the two categorical variables, Gender and Admit. As before, we look at eight folks in a fragment of the database (Table 5.1).

Ignoring the department variable for now, the first person is a male who was admitted, so he goes in Table 5.2.

Then, we have a rejected male, another admitted male, and a rejected female. We will enter these data as counts in each cell (Table 5.3).

Finishing the table and adding row and column totals gives results that certainly look discriminatory (Table 5.4). However, these are only eight cases out of thousands. Table 5.5 is the full table for all 4526 applicants.

Get Introductory Statistics and Analytics: A Resampling Perspective now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.