The chi-square test of independence
The chi-square test of independence is a statistical test used to determine whether two categorical variables are independent of each other or not.
Let's take the following example to see whether there is a preference for a book based on the gender of people reading it:
Flavour | ||||
---|---|---|---|---|
Total |
Biography |
Suspense |
Romance |
Gender |
280 |
60 |
120 |
100 |
Men |
640 |
90 |
200 |
350 |
Women |
920 |
150 |
320 |
450 |
The Chi-Square test of independence can be performed using the chi2_contingency
function in the SciPy package:
>>> men_women = np.array([[100, 120, 60],[350, 200, 90]]) >>> stats.chi2_contingency(men_women) (28.362103174603167, 6.9382117170577439e-07, 2, array([[ 136.95652174, 97.39130435, 45.65217391], [ 313.04347826, 222.60869565, ...
Get Mastering Python for Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.