Errata

Errata for Deep Learning from Scratch

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
	web Figure 1.1	The function in figure 1.1 is not Relu but Leaky Relu.	Jaap van der Does	Oct 16, 2019
	1. Foundation, The Chain Rule, first formula	The formula "df2(x)/du = df2(f1(x))/du * df1(x)/du" is correct? I tihnk is it should be "df1f2(x)/du = df2(f1(x))/du * df1(x)/du".	Hiroki Nishimoto	Oct 23, 2019
	Chap 1 Figure 1.18	Shouldn't the symbol in the second blue box should be a \sigma and not \delta?	Venkatesh-Prasad Ranganath	Nov 10, 2019
	Chapter 1 Figure 1.1	The figure says ReLU function, but instead plots Leaky ReLU	Tamirlan Seidakhmetov	Jan 05, 2020
	Chaptet 1 "The Fun Part: The Backward Pass" -> "Code" section	Here "then increasing x11 by 0.001 should increase L by 0.01 × 0.2489", 0.01 should be changed to 0.001	Tamirlan Seidakhmetov	Jan 07, 2020
	Chapter 2 Linear Regression: The Code	Inside forward_linear_regression function, loss formula is incorrect. Wrong: loss = np.mean(np.power(y_batch - P, 2)) Correct: loss = np.mean(np.power(y_batch - P), 2)	Tamirlan Seidakhmetov	Jan 14, 2020
Printed	Page p. 84 2nd to last paragraph	default activation is said to be "Linear", but in the code snippet it is actually "Sigmoid". So in the code snippet on p.91, the linear_regression neural network would need an explicit assignment of the activation to Linear(), otherwise Sigmoid() would be used.	Anonymous	Oct 27, 2020
ePub	Page ePub does not give page number 1. Foundations, "The Fun Part: The Backward Pass", Code: Now let's verify that everything worked	How can we verify L is correct when W is not given? W is assigned random numbers, but we don't know what they are.	Luke	Feb 06, 2022
ePub	Page Chapter 1, Nested Functions, Code Code sample for chain_length_2	Code sample has errors. First line should be: from typing import Callable, List Second line should be: from numpy import ndarray Last line should be: return f2(f1(a))	Ellery Chan	Apr 22, 2022
Printed	Page pages 32 and 33 Bottom diagram of page 32 and first diagram on page 33	This is in the Italian translation of the First Edition. In the first bracket of the diagram at the bottom of page 32, at row 2 and column 2, the first weight’s subscript should be 12 and not 11 (ie w_12 instead of w_11 (here ‘_’ denotes subscript)). The same issue occurs in the same bracket in the diagram at the top of page 33.	Anonymous	Sep 19, 2023
ePub	Page https://learning.oreilly.com/library/view/deep-learning-from/9781492041405/ch01.html John Cochrane, [Investments] Notes 2006	John Cochrane, [Investments] Notes 2006 => hyperlink is broken	Gökçe Aydos	Sep 22, 2023
ePub	Page Appendix - Matrix Chain Rule "it isn’t too hard to see that the partial derivative of this with respect to x1"	> it isn’t too hard to see that the partial derivative of this with respect to `x_1` `x_1` => `x_11`.	Gökçe Aydos	Sep 28, 2023
Printed	Page 10 return statement in def chain_length_2() function	In chain_length_2() function, the return statement is f2(f1(x)) but x is undefined. The return statement should be f2(f1(a)) which a is the input for the function.	Anonymous	Oct 08, 2019
PDF	Page 10 Figure 1-7	The use of f1 f2 to indicate the composite f2(f1(x)) is confusing and non-standard. If the author wanted to pipe the functions sequentially to create the composite above then there is a standard way of doing this. Otherwise it should be simply noted. As it currently stands the notation does not imply composition (and if so, incorrectly) but rather multiplication. Solution: There is a standard way of doing what the author wants. David Spivak at MIT uses this for an applied category theory course in https://mitpress.mit.edu/books/category-theory-sciences Use a semicolon. f1 ; f2 so this implies first apply f1 then apply f2 to the result.	Bradford Fournier-Eaton	Nov 01, 2021
PDF	Page 11 the Math formula	1) Page 11 - the math formula. From math standpoint this formula is incorrect. The LHS is not the derivative of f2 w.r.t. x, it is the derivative of the composite function "(f2 ∘ f1)" w.r.t. x. I am sure that's what the author meant but that's not what's written. So the author should probably denote this composite function as some letter e.g. h and then the LHS can be written as dh/du. Then the formula would be OK. 2) Also figure 1.7 on the previous page is incorrect. This composite function is not f1 . f2, it is f2 . f1 (this is read as f2 after f1), or as written usually in math terms (f2 ∘ f1). 3) Same error on figure 1.8 (page 12). Same error f1 . f2 on the first line of page 11. 4) Also, the terms nested functions and composite functions are used interchangeably which is very very confusing. In programming the term nested functions has a different meaning usually, it is when a function is defined inside another function's body. This is not what the word is about here. So this whole sequence of pages starting at page 9 needs serious editing, it's full of confusions.	Peter Petrov	Mar 26, 2021
PDF	Page 11 Chain Rule Equation	As others have stated, the chain rule is incorrect: The chain rule does not represent the derivative of a particular (of two) functions (author notes it as f_{2}), it should be the derivative of the composite.	Bradford Fournier-Eaton	Nov 01, 2021
PDF	Page 13 In the function : chain_deriv_2	# df1/dx f1_of_x = f1(input_range) In the above line f1_of_x calculates applied function f1 over the input data , the comment suggest its calculating derivative which is not true.	Pradeep Kumar	Oct 10, 2020
PDF	Page 13 In the function : chain_deriv_2	There is no where it is mentioned what is plot_chain does. No codes are given in that chapter for reference neither its clear what does it do. This function is being used everywhere in the first chapter In the type annotations of ndarray it should be either np.ndarray or numpy.ndarray	Pradeep Kumar	Oct 10, 2020
Printed	Page 25 Last paragraph	The text reads "...the gradient of X with respect to X." but it should read "...the gradient of N with respect to X." A gradient is a property of a function, not a vector.	Jason Gastelum	Dec 25, 2020
Printed	Page 28 Chapter 1	"we compute quantities on the forward pass (here, just N)" contradicts what was defined as "forward pass" on p. 16 in the chain_deriv_3() method. According to that definition, the foward pass comprises N = np.dot(,X, W) as well as S = sigma(N)	Anonymous	Apr 29, 2020
PDF	Page 58, Code line number 15	In the backward pass, (if I am not wrong) we essentially want to find by how much value does the output changes when input is changes by some value. In the code on page number 58, 'L' being the output of the neural network and 'X' being the input layer, the formula written for dLdX (dLdX = np.dot(dSdN, dNdX)) looks wrong as dot product of dSdN and dNdX will give dSdX (dS/dX), however we want to find dLdX (dL/dX). Correct formula for dLdX shoul be np.dot(dLdN, dNdX). In the same code, dLdN is computed but was never used. There must be a printing mistake but is causing a big error. To support this, on page number 47, 48, in the matrix_function_backward_1() function, we have returned np.dot(dSdN, dNdX) which gives dSdX (dS/dX) and this is exactly change in output due to change in input for this particular problem. I have just started with deep learning, so I might be wrong here! Correct me if it's a mistake. Thank you and a great book!	Prathamesh Waghmare	Sep 14, 2023
Printed	Page 64 Tabel 2-1 Derivative table for neural network	the partial derivative dLdP = -(forward_info[y] - forward_info[p]) should be -2 * (forward_info[y] - forward_info[p]), just like the explanation on page 51.	Anonymous	Oct 25, 2019
Printed	Page 65 Paragraph "The overall loss gradient"	I believe that in the Jupyter Notebook on GitHub in "loss_gradients" the values assigned to loss_gradients['B1'] and loss-gradients['W2'] are erroneously summed across axis=0 twice, in the original assignment for dLdB1 and dLdB2 and then again in the assignment to loss_gradients. This makes e.g. the loss gradient for B1 not a vector with 13 elements but a scalar, so that the gradient descent updates all elements of B1 with the same gradient value, which I think is not correct. The effect on the outcome seems minor, but the graph printed on p.67 looks somewhat different.	Anonymous	Oct 27, 2020
Printed	Page 65 2nd	In the source code, there is a bug while calculating the gradients with respect to W2: https://github.com/SethHWeidman/DLFS_code/blob/master/02_fundamentals/Code.ipynb Incorrect: dLdW2 = np.dot(dM2dW2, dLdP) Correct: dLdW2 = np.dot(dM2dW2, dLdM2)	Eugen Grosu	Jan 03, 2021
Printed	Page 66 Bottom	The figure 2-13 is obviously the same as figure 2-6. There is no difference in the fit when comparing them.	James Svacha	Jul 12, 2020
Printed, PDF	Page 88 section heading	Heading is the same as the chapter title and the book title. "In DLfS refer to the section DLfS in the chapter DLfS" might be a little confusing.	Anonymous	Feb 29, 2020
Printed	Page 91 NeuralNetwork class invocations in the code	The NeuralNetwork class, when used on page 91, is given a learning_rate parameter --- there's no learning_rate in the __init__ function for that class, and no methods in the class use the learning_rate. This is not surprising, as the learning-rate is something the Optimizer class (introduced on the following pages) cares about.	David Mankins	Sep 13, 2023
PDF	Page 94 __init__ method of class Trainer	The __init__ method is missing self.optim = optim before the setattr line.	Rodrigo Stevaux	Oct 07, 2020
ePub	Page 99 1st paragraph	In the Lincoln library, required to run the cpde for chapter 4, 'lincoln.utils.np_utils' does not contain the function 'exp_ratios'.	Steven Kaminsky	Jan 14, 2020
PDF	Page 166 The code for auto differentiation	Auto differentiation code of book need to replace self.grad with backward_grad so as could calculate derivate correctly. Otherwise, try: a = NumberWithGrad(2) b = a * 4 c = b + 5 d=b*2 e=d+c e.backward() will get a.grad=20, which in face should be 12	Nanyu	Sep 21, 2023