Skip to Content
Deep Learning from Scratch
book

Deep Learning from Scratch

by Seth Weidman
September 2019
Intermediate to advanced
250 pages
6h 58m
English
O'Reilly Media, Inc.
Content preview from Deep Learning from Scratch

Appendix A. Deep Dives

In this section, we dive deep into a few technical areas that are important to understand for completion, but are not essential.

Matrix Chain Rule

First up is an explanation of why we can substitute WT for ν u ( X ) in the chain rule expression from Chapter 1.

Remember that L is literally:

σ ( X W 11 ) + σ ( X W 12 ) + σ ( X W 21 ) + σ ( X W 22 ) + σ ( X W 31 ) + σ ( X W 32 )

where this is shorthand for the fact that:

σ ( X W 11 ) = σ ( x 11 × w 11 + x 12 × w 21 + x 13 × w 31 )
σ ( X W 12 ) = σ ( x 11 × w 12 + x 12 × w 22 + x 13 × w 32 )

and so on. Let’s zoom in on just one of these expressions. What would it look like if we took the partial derivative of, say, σ ( X W 11 ) with respect to every element of X (which is ultimately what we’ll want to do with all six components of L )?

Well, since:

σ ( X W 11 ) = σ ( x 11 × w 11 + x 12 × w 21 + x 13 × w 31 )

it isn’t too hard to see that the partial derivative of this with respect to x 1 , via a very simple application of the chain rule, is:

σ u ( X W 11 ) × w 11

Since the only thing that x11 is multiplied by in the XW11 expression is w11, the partial derivative with respect to everything else is 0.

So, computing the partial derivative of σ(XW11) with respect to all of the elements of X gives us the following overall expression for σ(XW 11 ) X :

σ(XW 11 ) X = σ u ( X W 11 ) × w 11
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Grokking Deep Learning

Grokking Deep Learning

Andrew W. Trask
Deep Learning with PyTorch

Deep Learning with PyTorch

Eli Stevens, Luca Pietro Giovanni Antiga, Thomas Viehmann

Publisher Resources

ISBN: 9781492041405Errata Page