
Algorithms that minimize the simple cost functions 333
Equating the likelihood and prior terms in this expression to give E(C) = 0
and taking p
k
≈ 1/
˜
K, where
˜
K is some a priori expected number of c lusters,
then gives
α
E
≈ −
m
2 log(1/
˜
K)
. (8.22)
The parameter σ
2
in Equation (8.21) (the within-cluster varianc e) can be
estimated fro m the data, see below.
The extended K-means (EKM) algorithm is then as follows. First an initial
configuratio n U with a very large number of clusters K is chosen (for single-
band data this might conveniently be the K ≤ 256 gray values that an ima ge
with 8-bit quantization can possibly have) and initial values
ˆ
µ
k
=
1
m
k
m
X
ν=1
u
kν ...