
Neural networks 251
Figure 6.8. This modification of the dis criminant has the advantag e that the
output signal saturates at values zer o or one for large ne gative or positive in-
puts, respectively. However, Bishop (1995) suggests that there is also a good
statistical justification for using it. Suppose the two classes are normally
distributed with Σ
1
= Σ
2
= I. Then
p(g | k) =
1
2π
exp(
−kg − µ
k
k
2
2
),
for k = 1, 2, and we have with Bayes’ Theorem
Pr(1 | g) =
p(g | 1)Pr(1)
p(g | 1)Pr(1) + p(g | 2)Pr(2)
=
1
1 + p(g | 2)Pr(2)/(p(g | 1)Pr(1))
=
1
1 + exp(−
1
2
[kg −µ
2
k
2
− kg − µ
1
k
2
])(Pr(2)/Pr(1))
.
With the substitution
e
−a
= Pr(2)/Pr(1)
we get
Pr(1 | g) =
1
1 + exp(−
1
2
[kg −µ
2