Chapter 3. Statistics
Applying the basic principles of statistics to data science provides vital
insight into our data. Statistics is a powerful tool. Used correctly, it
enables us to be sure of our decision-making process. However, it is easy to
use statistics incorrectly. One example is Anscombe’s quartet (Figure 3-1), which demonstrates how four distinct datasets can
have nearly identical statistics. In many cases, a simple plot of the data
can alert us right away to what is really going on with the data. In the
case of Anscombe’s quartet, we can instantly pick out these features: in the
upper-left panel,
and
appear to be linear, but noisy. In the upper-right
panel, we see that
and
form a peaked relationship that is nonlinear. In the
lower-left panel,
and
are precisely linear, except for one outlier. The
lower-right panel shows that is statistically distributed for and that there ...