Errata
The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update
Version  Location  Description  Submitted by  Date submitted 

web Figure 1.1 
The function in figure 1.1 is not Relu but Leaky Relu. 
Jaap van der Does  Oct 16, 2019  
1. Foundation, The Chain Rule, first formula  The formula "df2(x)/du = df2(f1(x))/du * df1(x)/du" is correct? I tihnk is it should be "df1f2(x)/du = df2(f1(x))/du * df1(x)/du". 
Hiroki Nishimoto  Oct 23, 2019  
Chap 1 Figure 1.18 
Shouldn't the symbol in the second blue box should be a \sigma and not \delta? 
VenkateshPrasad Ranganath  Nov 10, 2019  
Chapter 1 Figure 1.1 
The figure says ReLU function, but instead plots Leaky ReLU 
Tamirlan Seidakhmetov  Jan 05, 2020  
Chaptet 1 "The Fun Part: The Backward Pass" > "Code" section 
Here "then increasing x11 by 0.001 should increase L by 0.01 × 0.2489", 0.01 should be changed to 0.001 
Tamirlan Seidakhmetov  Jan 07, 2020  
Chapter 2 Linear Regression: The Code 
Inside forward_linear_regression function, loss formula is incorrect. 
Tamirlan Seidakhmetov  Jan 14, 2020  
Printed  Page p. 84 2nd to last paragraph 
default activation is said to be "Linear", but in the code snippet it is actually "Sigmoid". So in the code snippet on p.91, the linear_regression neural network would need an explicit assignment of the activation to Linear(), otherwise Sigmoid() would be used. 
Anonymous  Oct 27, 2020 
ePub  Page ePub does not give page number 1. Foundations, "The Fun Part: The Backward Pass", Code: Now let's verify that everything worked 
How can we verify L is correct when W is not given? W is assigned random numbers, but we don't know what they are. 
Luke  Feb 06, 2022 
ePub  Page Chapter 1, Nested Functions, Code Code sample for chain_length_2 
Code sample has errors. 
Ellery Chan  Apr 22, 2022 
Printed  Page pages 32 and 33 Bottom diagram of page 32 and first diagram on page 33 
This is in the Italian translation of the First Edition. 
Anonymous  Sep 19, 2023 
ePub  Page https://learning.oreilly.com/library/view/deeplearningfrom/9781492041405/ch01.html John Cochrane, [Investments] Notes 2006 
John Cochrane, [Investments] Notes 2006 => hyperlink is broken 
Gökçe Aydos  Sep 22, 2023 
ePub  Page Appendix  Matrix Chain Rule "it isn’t too hard to see that the partial derivative of this with respect to x1" 
> it isn’t too hard to see that the partial derivative of this with respect to `x_1` 
Gökçe Aydos  Sep 28, 2023 
Printed  Page 10 return statement in def chain_length_2() function 
In chain_length_2() function, the return statement is f2(f1(x)) but x is undefined. The return statement should be f2(f1(a)) which a is the input for the function. 
Anonymous  Oct 08, 2019 
Page 10 Figure 17 
The use of f1 f2 to indicate the composite f2(f1(x)) is confusing and nonstandard. If the author wanted to pipe the functions sequentially to create the composite above then there is a standard way of doing this. Otherwise it should be simply noted. 
Bradford FournierEaton  Nov 01, 2021  
Page 11 the Math formula 
1) Page 11  the math formula. 
Peter Petrov  Mar 26, 2021  
Page 11 Chain Rule Equation 
As others have stated, the chain rule is incorrect: The chain rule does not represent the derivative of a particular (of two) functions (author notes it as f_{2}), it should be the derivative of the composite. 
Bradford FournierEaton  Nov 01, 2021  
Page 13 In the function : chain_deriv_2 
# df1/dx 
Pradeep Kumar  Oct 10, 2020  
Page 13 In the function : chain_deriv_2 
There is no where it is mentioned what is plot_chain does. No codes are given in that chapter for reference neither its clear what does it do. This function is being used everywhere in the first chapter 
Pradeep Kumar  Oct 10, 2020  
Printed  Page 25 Last paragraph 
The text reads "...the gradient of X with respect to X." but it should read "...the gradient of N with respect to X." A gradient is a property of a function, not a vector. 
Jason Gastelum  Dec 25, 2020 
Printed  Page 28 Chapter 1 
"we compute quantities on the forward pass (here, just N)" 
Anonymous  Apr 29, 2020 
Page 58, Code line number 15 
In the backward pass, (if I am not wrong) we essentially want to find by how much value does the output changes when input is changes by some value. 
Prathamesh Waghmare  Sep 14, 2023  
Printed  Page 64 Tabel 21 Derivative table for neural network 
the partial derivative dLdP = (forward_info[y]  forward_info[p]) should be 2 * (forward_info[y]  forward_info[p]), just like the explanation on page 51. 
Anonymous  Oct 25, 2019 
Printed  Page 65 Paragraph "The overall loss gradient" 
I believe that in the Jupyter Notebook on GitHub in "loss_gradients" the values assigned to loss_gradients['B1'] and lossgradients['W2'] are erroneously summed across axis=0 twice, in the original assignment for dLdB1 and dLdB2 and then again in the assignment to loss_gradients. This makes e.g. the loss gradient for B1 not a vector with 13 elements but a scalar, so that the gradient descent updates all elements of B1 with the same gradient value, which I think is not correct. The effect on the outcome seems minor, but the graph printed on p.67 looks somewhat different. 
Anonymous  Oct 27, 2020 
Printed  Page 65 2nd 

Eugen Grosu  Jan 03, 2021 
Printed  Page 66 Bottom 
The figure 213 is obviously the same as figure 26. There is no difference in the fit when comparing them. 
James Svacha  Jul 12, 2020 
Printed, PDF  Page 88 section heading 
Heading is the same as the chapter title *and* the book title. 
Anonymous  Feb 29, 2020 
Printed  Page 91 NeuralNetwork class invocations in the code 
The NeuralNetwork class, when used on page 91, is given a learning_rate parameter  there's no learning_rate in the __init__ function for that class, and no methods in the class use the learning_rate. This is not surprising, as the learningrate is something the Optimizer class (introduced on the following pages) cares about. 
David Mankins  Sep 13, 2023 
Page 94 __init__ method of class Trainer 
The __init__ method is missing self.optim = optim before the setattr line. 
Rodrigo Stevaux  Oct 07, 2020  
ePub  Page 99 1st paragraph 
In the Lincoln library, required to run the cpde for chapter 4, 'lincoln.utils.np_utils' does not contain the function 'exp_ratios'. 
Steven Kaminsky  Jan 14, 2020 
Page 166 The code for auto differentiation 
Auto differentiation code of book need to replace self.grad with backward_grad so as could calculate derivate correctly. 
Nanyu  Sep 21, 2023 