
2.3 The Sum of Error Squares Classifier 35
2.3 THE SUM OF ERROR SQUARES CLASSIFIER
The goal in this section remains the same: to estimate the vector of parameters, w, in the extended R
l+1
space of a linear classifier (hyperplane),
w
T
x = 0
where x is the (augmented-by-1) feature vector. However, in this section the assumption of linear
separability is not required. The method, also known as least squares (LS), estimates the best linear
classifier, where the term “best” corresponds to the w that minimizes the cost:
J(w) =
N
i=1
(y
i
−w
T
x
i
)
2
(2.2)
where y
i
is the known class label of x
i
, i = 1, 2,..., N;andN is the number of training points.
Define
X =
⎡
⎢
⎢
⎢
⎢
⎣
x
T
1
x
T