 102 STATISTICS IS EASY!
print "Observed F-statistic: %.2f" % observed_f_statistic
print count, "out of 10000 experiments had a F-statistic greater than or equal to
%.2f" % observed_f_statistic
print "Probability that chance alone gave us a F-statistic",
print "of %.2f or more" % observed_f_statistic, "is", (count / float(num_shuffles))
OneWayAnova.vals
>grp_a
45 44 34 33 45 46 34
>grp_b
34 34 50 49 48 39 45
>grp_c
24 34 23 25 36 28 33 29
OneWayAnovaConf.py
#!/usr/bin/python
######################################
# One-Way ANOVA Confidence Interval
# From: Statistics is Easy! By Dennis Shasha and Manda Wilson
#
#
# Uses shuffling & bootstrapping to get a 90% confidence interval for the f-statistic.
#
# Author: Manda Wilson
#
# Example of FASTA formatted input file:
# >grp_a
# 45 44 34 33 45 46 34
# >grp_b
# 34 34 50 49 48 39 45
# >grp_c
# 24 34 23 25 36 28 33 29
#
# Included in the code, but NOT in the pseudocode,
# is the bias-corrected confidence interval.
# See http://www.statisticsiseasy.com/BiasCorrectedConfInter.html
#
# Pseudocode:
#
# 1. Calculate f-statistic for the observed values (in our example it is 11.27).
# a. Initialize within sum of squares (wss), total sum (ts),
# total count (tc), and between sum of squares (bss) to 0
# b. For each group:
# i. Within group mean (wgm) = sum of group values / number of values in group
# ii. Store the number of values in group
# iii. Sum the following: for each value in the group
# I. Subtract the value from wgm
# II. Square the result of step (1biiiI) APPENDIX B 103
# III. Add the result of step (1biiiII) to wss
# c. Total mean (tm) = ts / tc
# d. For each group:
# i. Subtract the wgm from tm
# ii. Square the result of step (1di)
# iii. Multiply the result of step (1dii) by
# the number of values in that group (stored in step (1bii))
# iv. Add the result of step (1diii) to bss
# e. Between degrees of freedom (bdf) = number of groups - 1
# f. Within degrees of freedome (wdf) = tc - number of groups
# g. Within group variance (wgv) = wss / wdf
# h. Between group variance (bgv) = bss / bdf
# i. f-statistic = wgv / bgv
#
# 2. Do the following 10,000 times:
# a. For each sample we have get a bootstrap sample:
# i. Create a new array of the same size as the original sample
# ii. Fill the array with randomly picked values from the original sample
(randomly picked with replacement)
# b. Calculate the f-statistic on the bootstrap samples, just as we did in step (1)
# with the original samples.
#
# 3. Sort the f-statistics computed in step (2).
#
# 4. Compute the size of each interval tail. If we want a 90% confidence interval,
then 1 - 0.9 yields the
# portion of the interval in the tails. We divide this by 2 to get the size of each
tail, in this case 0.05.
#
# 5. Compute the upper and lower bounds. To get the lower bound we multiply the tail
size by the number of
# bootstraps we ran and round this value up to the nearest integer (to make sure
this value maps
# to an actual boostrap we ran). In this case we multiple 0.05 by 10,000 and get
500, which rounds up to 500.
#
# To compute the upper bound we subtract the tail size from 1 and then multiply that
by the number of bootstraps
# we ran. Round the result of the last step down to the nearest integer. Again,
this is to ensure this value
# maps to an actual bootstrap we ran. We round the lower bound up and the upper
bound down to reduce the
# confidence interval size, so that we can still say we have as much confidence in
this result.
#
# 6. The bootstrap values at the lower bound and upper bound give us our confidence
interval.
#
######################################
import random
import math
import sys

Get Statistics is Easy! now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.