Appendix A. Deep Dives

In this section, we dive deep into a few technical areas that are important to understand for completion, but are not essential.

Matrix Chain Rule

First up is an explanation of why we can substitute W^T for $\frac{\partial ν}{\partial u} (X)$ in the chain rule expression from Chapter 1.

Remember that L is literally:

σ (X W_{11}) + σ (X W_{12}) + σ (X W_{21}) + σ (X W_{22}) + σ (X W_{31}) + σ (X W_{32})

where this is shorthand for the fact that:

σ (X W_{11}) = σ (x_{11} \times w_{11} + x_{12} \times w_{21} + x_{13} \times w_{31})

σ (X W_{12}) = σ (x_{11} \times w_{12} + x_{12} \times w_{22} + x_{13} \times w_{32})

and so on. Let’s zoom in on just one of these expressions. What would it look like if we took the partial derivative of, say, $σ (X W_{11})$ with respect to every element of $X$ (which is ultimately what we’ll want to do with all six components of $L$ )?

Well, since:

σ (X W_{11}) = σ (x_{11} \times w_{11} + x_{12} \times w_{21} + x_{13} \times w_{31})

it isn’t too hard to see that the partial derivative of this with respect to $x_{1}$ , via a very simple application of the chain rule, is:

\frac{\partial σ}{\partial u} (X W_{11}) \times w_{11}

Since the only thing that x₁₁ is multiplied by in the XW₁₁ expression is w₁₁, the partial derivative with respect to everything else is 0.

So, computing the partial derivative of σ(XW₁₁) with respect to all of the elements of X gives us the following overall expression for $\frac{\partial σ (X W_{11})}{\partial X}$ :

\frac{\partial σ (X W_{11})}{\partial X} = [\begin{matrix} \frac{\partial σ}{\partial u} (X W_{11}) \times w_{11} \end{matrix}]

Get Deep Learning from Scratch now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Learning from Scratch by Seth Weidman

Appendix A. Deep Dives

Matrix Chain Rule

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly