B. Backpropagation
In this appendix, we use the formal neural network notation from Appendix A to dive into the partial-derivative calculus behind the backpropagation method introduced in Chapter 8.
Let’s begin by defining some additional notation to help us along. Backpropagation works backwards, so the notation is based on the final layer (denoted L), and the earlier layers are annotated with respect to it (L – 1, L – 2, . . . L – n). The weights, biases, and outputs from functions are subscripted appropriately with this same notation. Recall from Equations 7.1 and 7.2 that the layer activation aL is calculated by multiplying the preceding layer’s activation (aL–1) by the weight wL and bias bL terms to produce zL and passing this through an ...
Get Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.