
SPARSE PRINCIPAL COMPONENTS ANALYSIS 211
coder is based on a p × m matrix of weights W with m < p; it is used to
create m linear combinations of the input vector x. Each such linear combi-
nation is passed through a nonlinear function σ, with the sigmoid function
σ(t) = 1/(1 + e
−t
) being one typical choice, as represented in Figure 8.3 via
the vector function h(x) = σ(W
T
x). The output layer is then modeled as
Wh(x) = Wσ(W
T
x).
3
Given input vectors x
i
for i = 1, . . . , N, the weight
matrix W is then estimated by solving the (nonconvex) optimization problem
minimize
W∈R
m×p
(
1
2
N
X
i=1
kx
i
− Wh(x
i
)k
2
)
. (8.20)
If we restrict σ to be the identity function, then