
Regression
and Neural Networks
167
minimizes the mean square error in the predicted values of Y. In
other words, if that ß is applied to each training sample to predict the
value of the dependent variable, and an error is computed by subtract-
ing the predicted value from the actual value in the sample, then the
sum of squares of those errors will be minimized relative to any other
possible ß. Note that this is exactly the same criterion used in
gradient learning of neural network weights.
Only the poorest programs would compute ß by inverting A'A
in the above formula. That involves far more computation than
needed. Good programs will use th ...