121
10
Calculating Sample Sizes
Parameters and variables:
n = sample size
θ = population parameter about which the inference is to be made
T = test statistic, which is a function of the observed data, and whose
probability distribution depends on the value of θ and n
T
c
= critical value of the test statistic, being such that if the test statistic
exceeds this value, the null is rejected
θ
0
= the value of the parameter that denes the boundary between accept-
able and unacceptable conditions
ψ = prespecied parameters other than the one(s) for which an inference
(hypothesis test) is to be made. For example, in a test about means, stan-
dard deviation, σ, is prespecied for purposes of power calculations.
Discussion:
In general, the probability of rejecting the null hypothesis can be
expressed as:
Pr {|T| ≥ T
c
|θ, n, ψ}.
We generally have chosen the critical value, T
c
, so that:
sup Pr {|T| ≥ T
c
|θ
0
, n, ψ} = 1 β.
Suppose we could choose another potential value of the parameter θ, say,
θ
a
, such that:
inf Pr {TT
c
|θ
a
, n, ψ} = α
for some specied value of α < 1 – β. Then, in theory, the two equations
could be used to solve for the sample size, n. There are some pragmatic
issues associated with this methodology. In particular, it is often difcult for
experimenters to specify the value of θ
a
. It may, however, be easier for experi-
menters to determine a potential range of economically feasible sample sizes
(or at least an upper bound). Thus, rather than calculating the solution to
the two simultaneous equations, the value of n might be xed, the critical
value T
c
determined using the rst equation, and then the value of θ
a
deter-
mined for a xed value of α. The value of θ
a
determined in this fashion may
122 Equivalence and Noninferiority Tests
shock theexperimenter into choosing a larger sample size, or it may seem
adequate. The question of “How bad is too bad?” is very context sensitive, as
is the question of “How large a sample size can we afford?”
For the tests described in this work, as sample size increases, the power
to reject the null hypothesis actually decreases. While this may seem coun-
terintuitive, it is in fact the appropriate relationship between power and
sample size. Larger sample sizes will result in more stringent tests. This is
true in regard to conventional hypothesis tests. Figure10.1 shows the power
curves for a test of a single proportion (Test 1.1), with sample sizes n = 100
and n=200. The hypotheses are:
H
0
: P < 0.95
H
1
: P 0.95.
Note, for example, if P = 0.90, with n = 100, the power to reject the null is
approximately 0.3209 (32.09 percent). With n = 200, the power at P = 0.90 is
0.1431 (14.31 percent). Thus, when the sample size doubled, the power was
reduced to less than half. Similarly, Figure10.2 shows power curves for a test
of a single mean (Test 2.1) with n = 20 and n = 40. The hypotheses are
H
0
: μ < 100
H
1
: μ 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.75 0.8 0.85 0.9 0.95
1
Power
Pa
Power n = 100
Power n = 200
FIGURE 10.1
Power curves for tests of a single proportion (Test 1.1).
123Calculating Sample Sizes
The graph is in terms of:
()
δ
=
µ−
σ
n
100
a
or the difference between the (hypothetical alternative) population mean
and the null value of 100, in standard deviation units. At
()
µ−
σ
=−
100
0.5
a
with n = 20, there is approximately a power to reject the null of 0.3048
(30.48percent). With n = 40, the power is approximately 0.0719 (7.19 percent).
The reader is referred to Desu and Raghavarao (1990) for a more general
discussion of sample size calculation methods.
0.1
0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
–1.3–1.2–1.1 –1 –0.9–0.8–0.7–0.6–0.5–0.4–0.3–0.2–0.1 0 0.1 0.2 0.3
0.4
Power
Delta/Sigma
Power n = 20
Power n = 40
FIGURE 10.2
Power curves for tests of a single mean (Test 2.1).

Get Equivalence and Noninferiority Tests for Quality, Manufacturing and Test Engineers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.