6 Equivalence and Noninferiority Tests
Test 2.2 Comparison of Two MeansTwo Independent
Samples, Fixed Δ Paradigm (One-Sided)
Parameters:
μ
1
= population mean, “group” 1
μ
2
= population mean, “group” 2
σ
1
= population standard deviation, “group” 1
σ
2
= population standard deviation, “group” 2
Δ
0
= maximum allowable difference between μ
1
and μ
2
Hypotheses:
H
0
: μ
1
< μ
2
Δ
0
H
1
: μ
1
μ
2
Δ
0
Data:
=X sample mean
1
, “group” 1
FIGURE 2.2
Test 2.1, JMP screen.
7Means
S
1
= sample standard deviation, “group” 1
n
1
= sample size, “group” 1
=X sample me
an
2
, “group” 2
S
2
= sample standard deviation, “group” 2
n
2
= sample size, “group” 2
Critical value(s):
Reject H
0
if:
XXtSE
12
10
−+ ≥−
−β
where t
1 − β
= 100*(1 − β) percentile of a central t-distribution with n
1
+ n
2
− 2
degrees of freedom and SE is the standard error for the difference of two
means:
SE
S
n
S
n
1
2
1
2
2
2
=+
.
Discussion:
If μ
1
= μ
2
Δ
0
exactly, then we would expect that
XX
120
>−
about as fre-
quently as
XX
120
<−
. Since μ
1
= μ
2
Δ
0
would be minimally acceptable,
we would want to avoid failing to conclude that μ
1
μ
2
Δ
0
just because
XX
120
<−
. That is, we would want to conclude that μ
1
< μ
2
Δ
0
only when
X
1
was sufciently less than
X
2
. In other words, we are willing to believe that
μ
1
μ
2
Δ
0
(i.e., the alternate hypothesis) as long as
XX
tSE
1201
≥−∆−
−β
.
As in the case of the single mean, the test statistic under various alternate
hypotheses has a noncentral t-distribution with n
1
+ n
2
– 2 degrees of free-
dom and noncentrality:
nn
a 0
1
2
12
2
2
δ=
∆−
σ+σ
.
Welch (1947) provided an alternative calculation for the degrees of free-
dom of the two-sample t-test, when it is assumed that the variances for the
two populations or systems are not equal.
8 Equivalence and Noninferiority Tests
Let:
()
=
+
W
Sn
Sn Sn
n 1
1
1
2
1
1
2
12
2
2
2
1
and
()
=
+
W
Sn
Sn Sn
n 1
2
2
2
2
1
2
12
2
2
2
2
.
Then the degrees of freedom for Welchs t-test are
df
WW
..
1
12
=
+
For simplicity, the conventional n
1
+ n
2
– 2 degrees of freedom will be used for
the examples presented. In actual practice, Welchs formula is recommended.
Power calculations are made as a function of the noncentrality parameter,
and particularly as a function of Δ
a
.
Example:
Suppose we hypothesize that the mean of “group” 1 is no more than
Δ
0
=5.0 units less than the mean of “group” 2. The data are
=X 96.0
1
S
1
= 2.40
n
1
= 12
=X 101.5
2
S
2
= 2.20
n
2
= 14
We choose 1 – β = 0.95, so t
1 − β
= 1.711 (12 + 14 – 2 = 24 degrees of freedom).
We compute
=+=+SE
S
n
S
n
2.40
12
2.20
14
0.909
1
2
1
2
2
2
22
.
The critical value is
XXtSE 96 101.5 1.711* 0.909 3.94
5.0
121
−+ =−
+≈−≥
−β
.
9Means
Therefore, we reject the null hypothesis, H
0
in favor of the alternate, H
1
.
If μ
1
= μ
2
– 5.0, then
t
XX
S
n
S
n
5.0
12
1
2
1
2
2
2
=
−+
+
has a central t-distribution with n
1
+ n
2
– 2 degrees of freedom.
Thus, the probability of rejecting the null hypothesis when μ
1
= μ
2
– 5.0 is
XXtSEPr 5.0 1
121
{}
−+ ≥−
=−β
−β
.
Under some specic alternate hypothesis, such as μ
1
= μ
2
Δ
a
, where
Δ
a
>5.0, then
t
XX
S
n
S
n
5.0
12
1
2
1
2
2
2
=
−+
+
has a noncentral t-distribution with n
1
+ n
2
– 2 degrees of freedom and non-
centrality parameter
nn
5.0
a
1
2
12
2
2
δ=
∆−
σ+σ
.
To calculate a power curve, we will make the simplifying assumptions that
σ=σ=σ==
+
=nn
nn
and
2
13
12 12
12
.
Thus, the noncentrality parameter simplies to
n 5.0
2
()
δ=
∆−
σ
.
Expressing
5.0
a
∆−
σ
as a proportion (i.e., the difference in σ units) is usually easier than obtaininga
reasonable estimate of σ. Thus, the power curve will be expressed as a function of
5.0
a
γ=
∆−
σ
10 Equivalence and Noninferiority Tests
for γ = 0, . . . , 2.0 (σ units). Figure2.3 shows the power curve for this test.
Condence interval formulation:
The expression
XX
tS
E
121
−+
−β
is a one-sided 100(1 − β) percent upper condence limit for μ
1
μ
2
. From the
example, the upper 95 percent condence limit for μ
1
μ
2
is
XXtSE 96 101.5 1.711* 0.909
3.94
121
−+ =−
+≈
−β
.
Computational considerations:
For this test, one could use a condence limit computed by various pro-
grams for the difference between two means. The upper condence limit
should be compared to the lower “acceptable” bound on the difference, and a
one-sided 100(1 − β) percent [or a two-sided 100(1 − 2β) percent] limit should
be computed, in concert with the two one-sided test (TOST) philosophy.
1
0.9
0.8
0.7
0.6
Power
0.5
0.4
0.3
0.2
0.1
0
–1.5 –1.3 –1.1 –0.9 –0.7
Delsig
–0.5 –0.3 –0.1 0 0.1 0.2
FIGURE 2.3
Test 2.2, power curve for equivalence of two means.

Get Equivalence and Noninferiority Tests for Quality, Manufacturing and Test Engineers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.