15Inference on Association of Categorical Variables

15.1 Introduction

In Chapter 3, we saw some evidence suggesting that whether a driver was searched during a traffic stop in San Diego was related to the driver's race (Figure 15.1). Some empirical probabilities also indicated that this was the case, for male and female drivers (Problems 5.44 and 5.45).

Graphs depict the race, gender, and whether a driver was searched during traffic stops in San Diego in the year two thousand and sixteen.

Figure 15.1 Race, gender, and whether a driver was searched during traffic stops in San Diego in 2016.

Ignoring the limitations of the data, the association between police searches and race may be simply due to chance. Statistical inference allows us to assess further if there is any association between being searched during a traffic stop in San Diego and drivers race.

The regression methods presented in Chapters 13 and 14 rely on the response variable being a numerical variable. These models can be adapted to scenarios when the response variable is categorical. Not only these models with categorical responses will allow us to test for association between categorical variables, but the strength of the association could be assessed as well. However, regression models involving categorical response variables is more technical to implement and a bit harder to interpret than ordinary multiple linear regression. There is a rather simple method to test if there is an association between two categorical variables (without assessing ...

Get Principles of Managerial Statistics and Data Science now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.