October 2017
Beginner to intermediate
270 pages
7h
English
From the perceptron algorithm onward, every neural architecture has had a means to optimize its internal parameters based on the comparison of the ground truth with the model output. The common assumption was to take the derivative of the (then simple) model function and iteratively work towards the minimum value.
For complex multilayer networks, there is an additional overhead, which has to do with the fact that the output layer's output is the result of a long chain of functions compositions, where each layer's output is wrapped by the next one's transfer function. So, the derivative of the output will involve the derivative of an exceedingly complex function. In this case, the backpropagation ...
Read now
Unlock full access