4 STATISTICS IS EASY!
less than 0.1%. Therefore, we conclude that the difference between the averages of the samples is
real. This is what statisticians call significant.
Let’s step back for a moment. What is the justification for shufing the labels? The idea is sim-
ply this: if the drug had no real effect, then the placebo would often give more improvement than
the drug. By shufing the labels, we are simulating the situation in which some placebo measure-
ments replace some drug measurements. If the observed average difference of 13 would be
matched or even exceeded in many of these shufings, then the drug might have no effect beyond
the placebo. That is, the observed difference could have occurred by chance.
To see that a similar average numerical advantage might lead to a different conclusion, consider
a fictitious variant of this example. Here we take a much greater variety of placebo values: 56 348
162 420 440 250 389 476 288 456 and simply add 13 more to get the drug values: 69 361 175 433
453 263 402 489 301 469. So the difference in the averages is 13, as it was in our original example.
In tabular form we get the following.
Figure 1.2: Difference between means.
Value Label
56 P
348 P
162 P
420 P
THE BASIC IDEA 5
This time, when we perform the 10,000 shufings, in approximately 40% of the shufings; the
difference between the D values and P values is greater than or equal to 13. So, we would con-
clude that the drug may have no benefit — the difference of 13 could easily have happened by
chance.
download code and input files
Here is an example run of the Diff2MeanSig.py code, using the rst data set from this example as
input:
Observed difference of two means: 12.97
7 out of 10,000 experiments had a difference of two means greater than or
equal to 12.97 .
The chance of getting a difference of two means greater than or equal to 12.97 is
0.0007.
In both the coin and drug case so far, we’ve discussed statistical significance. Could the
observed difference have happened by chance? However, this is not the same as importance, at
least not always. For example, if the drug raised the effect on the average by 0.03, we might not
440 P
250 P
389 P
476 P
288 P
456 P
69 D
361 D
175 D
433 D
453 D
263 D
402 D
489 D
301 D
469 D

Get Statistics is Easy! now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.