Chapter 2. Fundamentals

In Chapter 1, I described the major conceptual building block for understanding deep learning: nested, continuous, differentiable functions. I showed how to represent these functions as computational graphs, with each node in a graph representing a single, simple function. In particular, I demonstrated that such a representation showed easily how to calculate the derivative of the output of the nested function with respect to its input: we simply take the derivatives of all the constituent functions, evaluate these derivatives at the input that these functions received, and then multiply all of the results together; this will result in a correct derivative for the nested function because of the chain rule. I illustrated that this does in fact work with some simple examples, with functions that took NumPy’s `ndarray`s as inputs and produced `ndarray`s as outputs.

I showed that this method of computing derivatives works even when the function takes in multiple `ndarray`s as inputs and combines them via a matrix multiplication operation, which, unlike the other operations we saw, changes the shape of its inputs. Specifically, if one input to this operation—call the input X—is a B × N `ndarray`, and another input to this operation, W, is an N × M `ndarray`, then its output P is a B × M `ndarray`. While it isn’t clear what the derivative of such an operation would be, I showed that when a matrix multiplication ν(X, W) is included as a “constituent operation” in a nested ...

Get Deep Learning from Scratch now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.