Deep Learning from Scratch

Errata for Deep Learning from Scratch

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Figure 1.1

The function in figure 1.1 is not Relu but Leaky Relu.

Jaap van der Does  Oct 16, 2019 
1. Foundation, The Chain Rule, first formula

The formula "df2(x)/du = df2(f1(x))/du * df1(x)/du" is correct? I tihnk is it should be "df1f2(x)/du = df2(f1(x))/du * df1(x)/du".

Hiroki Nishimoto  Oct 23, 2019 
Chap 1
Figure 1.18

Shouldn't the symbol in the second blue box should be a \sigma and not \delta?

Venkatesh-Prasad Ranganath  Nov 10, 2019 
Chapter 1
Figure 1.1

The figure says ReLU function, but instead plots Leaky ReLU

Tamirlan Seidakhmetov  Jan 05, 2020 
Chaptet 1
"The Fun Part: The Backward Pass" -> "Code" section

Here "then increasing x11 by 0.001 should increase L by 0.01 × 0.2489", 0.01 should be changed to 0.001

Tamirlan Seidakhmetov  Jan 07, 2020 
Chapter 2
Linear Regression: The Code

Inside forward_linear_regression function, loss formula is incorrect.
Wrong: loss = np.mean(np.power(y_batch - P, 2))
Correct: loss = np.mean(np.power(y_batch - P), 2)

Tamirlan Seidakhmetov  Jan 14, 2020 
Printed Page p. 84
2nd to last paragraph

default activation is said to be "Linear", but in the code snippet it is actually "Sigmoid". So in the code snippet on p.91, the linear_regression neural network would need an explicit assignment of the activation to Linear(), otherwise Sigmoid() would be used.

Anonymous  Oct 27, 2020 
ePub Page ePub does not give page number
1. Foundations, "The Fun Part: The Backward Pass", Code: Now let's verify that everything worked

How can we verify L is correct when W is not given? W is assigned random numbers, but we don't know what they are.

Luke  Feb 06, 2022 
ePub Page Chapter 1, Nested Functions, Code
Code sample for chain_length_2

Code sample has errors.
First line should be: from typing import Callable, List
Second line should be: from numpy import ndarray
Last line should be: return f2(f1(a))

Ellery Chan  Apr 22, 2022 
Printed Page pages 32 and 33
Bottom diagram of page 32 and first diagram on page 33

This is in the Italian translation of the First Edition.

In the first bracket of the diagram at the bottom of page 32, at row 2 and column 2, the first weight’s subscript should be 12 and not 11 (ie w_12 instead of w_11 (here ‘_’ denotes subscript)).

The same issue occurs in the same bracket in the diagram at the top of page 33.

Anonymous  Sep 19, 2023 
ePub Page
John Cochrane, [Investments] Notes 2006

John Cochrane, [Investments] Notes 2006 => hyperlink is broken

Gökçe Aydos  Sep 22, 2023 
ePub Page Appendix - Matrix Chain Rule
"it isn’t too hard to see that the partial derivative of this with respect to x1"

> it isn’t too hard to see that the partial derivative of this with respect to `x_1`

`x_1` => `x_11`.

Gökçe Aydos  Sep 28, 2023 
Printed Page 10
return statement in def chain_length_2() function

In chain_length_2() function, the return statement is f2(f1(x)) but x is undefined. The return statement should be f2(f1(a)) which a is the input for the function.

Anonymous  Oct 08, 2019 
PDF Page 10
Figure 1-7

The use of f1 f2 to indicate the composite f2(f1(x)) is confusing and non-standard. If the author wanted to pipe the functions sequentially to create the composite above then there is a standard way of doing this. Otherwise it should be simply noted.

As it currently stands the notation does not imply composition (and if so, incorrectly) but rather multiplication.

There is a standard way of doing what the author wants. David Spivak at MIT uses this for an applied category theory course in

Use a semicolon.

f1 ; f2 so this implies first apply f1 then apply f2 to the result.

Bradford Fournier-Eaton  Nov 01, 2021 
PDF Page 11
the Math formula

1) Page 11 - the math formula.
From math standpoint this formula is incorrect.

The LHS is not the derivative of f2 w.r.t. x,
it is the derivative of the composite function "(f2 ∘ f1)" w.r.t. x.
I am sure that's what the author meant but that's not what's written.

So the author should probably denote this composite function as some letter e.g. h and then the LHS can be written as dh/du. Then the formula would be OK.

2) Also figure 1.7 on the previous page is incorrect. This composite function is not f1 . f2, it is f2 . f1 (this is read as f2 after f1), or as written usually in math terms (f2 ∘ f1).

3) Same error on figure 1.8 (page 12). Same error f1 . f2 on the first line of page 11.

4) Also, the terms nested functions and composite functions are used interchangeably which is very very confusing. In programming the term nested functions has a different meaning usually, it is when a function is defined inside another function's body. This is not what the word is about here.

So this whole sequence of pages starting at page 9 needs serious editing, it's full of confusions.

Peter Petrov  Mar 26, 2021 
PDF Page 11
Chain Rule Equation

As others have stated, the chain rule is incorrect: The chain rule does not represent the derivative of a particular (of two) functions (author notes it as f_{2}), it should be the derivative of the composite.

Bradford Fournier-Eaton  Nov 01, 2021 
PDF Page 13
In the function : chain_deriv_2

# df1/dx
f1_of_x = f1(input_range)

In the above line f1_of_x calculates applied function f1 over the input data , the comment suggest its calculating derivative which is not true.

Pradeep Kumar  Oct 10, 2020 
PDF Page 13
In the function : chain_deriv_2

There is no where it is mentioned what is plot_chain does. No codes are given in that chapter for reference neither its clear what does it do. This function is being used everywhere in the first chapter

In the type annotations of ndarray it should be either np.ndarray or numpy.ndarray

Pradeep Kumar  Oct 10, 2020 
Printed Page 25
Last paragraph

The text reads "...the gradient of X with respect to X." but it should read "...the gradient of N with respect to X." A gradient is a property of a function, not a vector.

Jason Gastelum  Dec 25, 2020 
Printed Page 28
Chapter 1

"we compute quantities on the forward pass (here, just N)"
contradicts what was defined as "forward pass" on p. 16 in the chain_deriv_3() method.
According to that definition, the foward pass comprises
N =,X, W)
as well as
S = sigma(N)

Anonymous  Apr 29, 2020 
PDF Page 58, Code
line number 15

In the backward pass, (if I am not wrong) we essentially want to find by how much value does the output changes when input is changes by some value.
In the code on page number 58, 'L' being the output of the neural network and 'X' being the input layer, the formula written for dLdX (dLdX =, dNdX)) looks wrong as dot product of dSdN and dNdX will give dSdX (dS/dX), however we want to find dLdX (dL/dX).
Correct formula for dLdX shoul be, dNdX). In the same code, dLdN is computed but was never used. There must be a printing mistake but is causing a big error.
To support this, on page number 47, 48, in the matrix_function_backward_1() function, we have returned, dNdX) which gives dSdX (dS/dX) and this is exactly change in output due to change in input for this particular problem.
I have just started with deep learning, so I might be wrong here! Correct me if it's a mistake. Thank you and a great book!

Prathamesh Waghmare  Sep 14, 2023 
Printed Page 64
Tabel 2-1 Derivative table for neural network

the partial derivative dLdP = -(forward_info[y] - forward_info[p]) should be -2 * (forward_info[y] - forward_info[p]), just like the explanation on page 51.

Anonymous  Oct 25, 2019 
Printed Page 65
Paragraph "The overall loss gradient"

I believe that in the Jupyter Notebook on GitHub in "loss_gradients" the values assigned to loss_gradients['B1'] and loss-gradients['W2'] are erroneously summed across axis=0 twice, in the original assignment for dLdB1 and dLdB2 and then again in the assignment to loss_gradients. This makes e.g. the loss gradient for B1 not a vector with 13 elements but a scalar, so that the gradient descent updates all elements of B1 with the same gradient value, which I think is not correct. The effect on the outcome seems minor, but the graph printed on p.67 looks somewhat different.

Anonymous  Oct 27, 2020 
Printed Page 65

In the source code, there is a bug while calculating the gradients with respect to W2:

Incorrect: dLdW2 =, dLdP)
Correct: dLdW2 =, dLdM2)

Eugen Grosu  Jan 03, 2021 
Printed Page 66

The figure 2-13 is obviously the same as figure 2-6. There is no difference in the fit when comparing them.

James Svacha  Jul 12, 2020 
Printed, PDF Page 88
section heading

Heading is the same as the chapter title *and* the book title.

"In DLfS refer to the section DLfS in the chapter DLfS" might be a little confusing.

Anonymous  Feb 29, 2020 
Printed Page 91
NeuralNetwork class invocations in the code

The NeuralNetwork class, when used on page 91, is given a learning_rate parameter --- there's no learning_rate in the __init__ function for that class, and no methods in the class use the learning_rate. This is not surprising, as the learning-rate is something the Optimizer class (introduced on the following pages) cares about.

David Mankins  Sep 13, 2023 
PDF Page 94
__init__ method of class Trainer

The __init__ method is missing self.optim = optim before the setattr line.

Rodrigo Stevaux  Oct 07, 2020 
ePub Page 99
1st paragraph

In the Lincoln library, required to run the cpde for chapter 4, 'lincoln.utils.np_utils' does not contain the function 'exp_ratios'.

Steven Kaminsky  Jan 14, 2020 
PDF Page 166
The code for auto differentiation

Auto differentiation code of book need to replace self.grad with backward_grad so as could calculate derivate correctly.

Otherwise, try:
a = NumberWithGrad(2)
b = a * 4
c = b + 5

will get a.grad=20, which in face should be 12

Nanyu  Sep 21, 2023