March 2019
Beginner to intermediate
182 pages
4h 6m
English
In this section, we will look at hypothesis testing, and also learn how to test the hypotheses using PySpark. Let's look at one particular type of hypothesis testing that is implemented in PySpark. This form of hypothesis testing is called Pearson's chi-square test. Chi-square tests how likely it is that the differences in the two datasets are there by chance.
For example, if we had a retail store without any footfall, and suddenly you get footfall, how likely is it that this is random, or is there even any statistically significant difference in the level of visitors that we are getting now in comparison to before? The reason why this is called the chi-square test is that the test itself references ...
Read now
Unlock full access