
276 Supervised Classification Part 1
classifiers have two disadvantages. They are designed for two-class problems
and their outputs, unlike feed-forward neural networks, do not model posterior
class membership probabilities in a natural way.
A common way to overcome the two-class restriction is to determine all
possible two-class results and then use a voting scheme to decide on the class
label (Wu et al., 2004). That is, for K c lasses, we train K(K − 1)/2 SVM’s
on each of the po ssible pairs (i, j) ∈ K ⊗ K. For a new observation g and the
SVM for (i, j), let
µ
ij
(g) = Pr(ℓ = i | ℓ = i or j, g) =
Pr(ℓ = i | g)
Pr(ℓ = i or j | g)
. (6.73)
The last equality ...