4 STATISTICS IS EASY!

less than 0.1%. Therefore, we conclude that the difference between the averages of the samples is

real. This is what statisticians call signiﬁcant.

Let’s step back for a moment. What is the justiﬁcation for shufﬂing the labels? The idea is sim-

ply this: if the drug had no real effect, then the placebo would often give more improvement than

the drug. By shufﬂing the labels, we are simulating the situation in which some placebo measure-

ments replace some drug measurements. If the observed average difference of 13 would be

matched or even exceeded in many of these shufﬂings, then the drug might have no effect beyond

the placebo. That is, the observed difference could have occurred by chance.

To see that a similar average numerical advantage might lead to a different conclusion, consider

a ﬁctitious variant of this example. Here we take a much greater variety of placebo values: 56 348

162 420 440 250 389 476 288 456 and simply add 13 more to get the drug values: 69 361 175 433

453 263 402 489 301 469. So the difference in the averages is 13, as it was in our original example.

In tabular form we get the following.

Figure 1.2: Difference between means.

Value Label

56 P

348 P

162 P

420 P

THE BASIC IDEA 5

This time, when we perform the 10,000 shufﬂings, in approximately 40% of the shufﬂings; the

difference between the D values and P values is greater than or equal to 13. So, we would con-

clude that the drug may have no beneﬁt — the difference of 13 could easily have happened by

chance.

download code and input ﬁles

Here is an example run of the Diff2MeanSig.py code, using the ﬁrst data set from this example as

input:

Observed difference of two means: 12.97

7 out of 10,000 experiments had a difference of two means greater than or

equal to 12.97 .

The chance of getting a difference of two means greater than or equal to 12.97 is

0.0007.

In both the coin and drug case so far, we’ve discussed statistical signiﬁcance. Could the

observed difference have happened by chance? However, this is not the same as importance, at

least not always. For example, if the drug raised the effect on the average by 0.03, we might not

440 P

250 P

389 P

476 P

288 P

456 P

69 D

361 D

175 D

433 D

453 D

263 D

402 D

489 D

301 D

469 D

Get *Statistics is Easy!* now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.