B. Backpropagation

In this appendix, we use the formal neural network notation from Appendix A to dive into the partial-derivative calculus behind the backpropagation method introduced in Chapter 8.

Let’s begin by defining some additional notation to help us along. Backpropagation works backwards, so the notation is based on the final layer (denoted L), and the earlier layers are annotated with respect to it (L – 1, L – 2, . . . L – n). The weights, biases, and outputs from functions are subscripted appropriately with this same notation. Recall from Equations 7.1 and 7.2 that the layer activation aL is calculated by multiplying the preceding layer’s activation (aL–1) by the weight wL and bias bL terms to produce zL and passing this through an ...

Get Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.