The chi-square test of independence

The chi-square test of independence is a statistical test used to determine whether two categorical variables are independent of each other or not.

Let's take the following example to see whether there is a preference for a book based on the gender of people reading it:

Flavour

Total

Biography

Suspense

Romance

Gender

280

60

120

100

Men

640

90

200

350

Women

920

150

320

450

 

The Chi-Square test of independence can be performed using the chi2_contingency function in the SciPy package:

>>> men_women = np.array([[100, 120, 60],[350, 200, 90]])
>>> stats.chi2_contingency(men_women)
(28.362103174603167, 6.9382117170577439e-07, 2, array([[ 136.95652174,   97.39130435,   45.65217391],
 [ 313.04347826, 222.60869565, ...

Get Mastering Python for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.