6Categorical Variables

In the previous chapter, we looked at binomial (two-outcome) variables. In this chapter, we expand that discussion to multi-category variables and relationships between categorical variables. After completing this chapter, you should be able to

  • summarize categorical data in two-way tables
  • calculate conditional probabilities
  • perform Bayesian calculations
  • perform tests of independence
  • use the multiplication rule
  • explain Simpson’s paradox

6.1 Two-way Tables

We start with the data on UC Berkeley graduate admissions that were introduced in Chapter 3, looking first at a breakdown by gender.

Table 6.1 is a “2-way” table—it portrays subjects by their status on two variables—gender and admission status. It shows that the admission rate for men is a lot higher than the admission rate for women. More generally, tables like this are known as R times C tables (for row by column) or contingency tables (because you can read counts for one variable contingent on the other variable taking a certain value).

In Table 6.2, we see these data as a percentage table, which makes clearer the difference between women and men with respect to admission rates.

Table 6.1 Applications to UC Berkeley departments.

Female Male All
Admitted 557 1198 1755
Rejected 1278 1493 2771
All 1835 2691 4526

Table 6.2 Applications to UC Berkeley departments.

Female Male All ...

Get Statistics for Data Science and Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.