
176
Chapter 10
last column in that table is the square root of the test set's mean
square error. As such, it can be considered a sort of average error.
The first thing to notice is that for the three-layer network,
there is a magic number of hidden neurons needed. Performance takes
a leap when the number of hidden neurons goes from three to four.
This is not unusual when dealing with training data having strong
features. Smaller networks simply do not have the theoretical
capability needed to separate the problem space. More importantly,
observe that there appears to be a floor that cannot be breached by
adding more neurons. In fact, the test-se ...