The chi-square test of independence

The chi-square test of independence is a statistical test used to determine whether two categorical variables are independent of each other or not.

Let's take the following example to see whether there is a preference for a book based on the gender of people reading it:

Flavour

Total

Biography

Suspense

Romance

Gender

280

60

120

100

Men

640

90

200

350

Women

920

150

320

450

 

The Chi-Square test of independence can be performed using the chi2_contingency function in the SciPy package:

>>> men_women = np.array([[100, 120, 60],[350, 200, 90]])
>>> stats.chi2_contingency(men_women)
(28.362103174603167, 6.9382117170577439e-07, 2, array([[ 136.95652174,   97.39130435,   45.65217391],
 [ 313.04347826, 222.60869565, ...

Get Mastering Python for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.