102 STATISTICS IS EASY!

print "Observed F-statistic: %.2f" % observed_f_statistic

print count, "out of 10000 experiments had a F-statistic greater than or equal to

%.2f" % observed_f_statistic

print "Probability that chance alone gave us a F-statistic",

print "of %.2f or more" % observed_f_statistic, "is", (count / float(num_shuffles))

OneWayAnova.vals

>grp_a

45 44 34 33 45 46 34

>grp_b

34 34 50 49 48 39 45

>grp_c

24 34 23 25 36 28 33 29

OneWayAnovaConf.py

#!/usr/bin/python

######################################

# One-Way ANOVA Confidence Interval

# From: Statistics is Easy! By Dennis Shasha and Manda Wilson

#

#

# Uses shuffling & bootstrapping to get a 90% confidence interval for the f-statistic.

#

# Author: Manda Wilson

#

# Example of FASTA formatted input file:

# >grp_a

# 45 44 34 33 45 46 34

# >grp_b

# 34 34 50 49 48 39 45

# >grp_c

# 24 34 23 25 36 28 33 29

#

# Included in the code, but NOT in the pseudocode,

# is the bias-corrected confidence interval.

# See http://www.statisticsiseasy.com/BiasCorrectedConfInter.html

# for more information on bias-corrected cofidence intervals.

#

# Pseudocode:

#

# 1. Calculate f-statistic for the observed values (in our example it is 11.27).

# a. Initialize within sum of squares (wss), total sum (ts),

# total count (tc), and between sum of squares (bss) to 0

# b. For each group:

# i. Within group mean (wgm) = sum of group values / number of values in group

# ii. Store the number of values in group

# iii. Sum the following: for each value in the group

# I. Subtract the value from wgm

# II. Square the result of step (1biiiI)

APPENDIX B 103

# III. Add the result of step (1biiiII) to wss

# c. Total mean (tm) = ts / tc

# d. For each group:

# i. Subtract the wgm from tm

# ii. Square the result of step (1di)

# iii. Multiply the result of step (1dii) by

# the number of values in that group (stored in step (1bii))

# iv. Add the result of step (1diii) to bss

# e. Between degrees of freedom (bdf) = number of groups - 1

# f. Within degrees of freedome (wdf) = tc - number of groups

# g. Within group variance (wgv) = wss / wdf

# h. Between group variance (bgv) = bss / bdf

# i. f-statistic = wgv / bgv

#

# 2. Do the following 10,000 times:

# a. For each sample we have get a bootstrap sample:

# i. Create a new array of the same size as the original sample

# ii. Fill the array with randomly picked values from the original sample

(randomly picked with replacement)

# b. Calculate the f-statistic on the bootstrap samples, just as we did in step (1)

# with the original samples.

#

# 3. Sort the f-statistics computed in step (2).

#

# 4. Compute the size of each interval tail. If we want a 90% confidence interval,

then 1 - 0.9 yields the

# portion of the interval in the tails. We divide this by 2 to get the size of each

tail, in this case 0.05.

#

# 5. Compute the upper and lower bounds. To get the lower bound we multiply the tail

size by the number of

# bootstraps we ran and round this value up to the nearest integer (to make sure

this value maps

# to an actual boostrap we ran). In this case we multiple 0.05 by 10,000 and get

500, which rounds up to 500.

#

# To compute the upper bound we subtract the tail size from 1 and then multiply that

by the number of bootstraps

# we ran. Round the result of the last step down to the nearest integer. Again,

this is to ensure this value

# maps to an actual bootstrap we ran. We round the lower bound up and the upper

bound down to reduce the

# confidence interval size, so that we can still say we have as much confidence in

this result.

#

# 6. The bootstrap values at the lower bound and upper bound give us our confidence

interval.

#

######################################

import random

import math

import sys

Get *Statistics is Easy!* now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.