Errata

Errata for Deep Learning for Coders with fastai and PyTorch

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
Printed	Page 513 1st paragraph	The computation of bwd for the Lin(LayerFunction), in the second line, refers to self.inp and self.out. class Lin(LayerFunction): def __init__(self, w, b): self.w,self.b = w,b def forward(self, inp): return inp@self.w + self.b def bwd(self, out, inp): inp.g = out.g @ self.w.t() self.w.g = self.inp.t() @ self.out.g self.b.g = out.g.sum(0) Should bwd be changed to the below? def bwd(self, out, inp): inp.g = out.g @ self.w.t() self.w.g = inp.t() @ out.g # <--- inp instead of self.inp. ditto for out self.b.g = out.g.sum(0)	Kaushik Sinha	Jun 28, 2021	Nov 05, 2021
Printed	Page 464 3rd paragraph	Note Fastai has changed so this should be head = create_head(5122,2,ps=0.5) not head = create_head(5124,2,ps=0.5) and revise text. Note from the Author or Editor: Indeed, this should be changed as suggested.	Conwyn Flavell	Apr 15, 2021	May 07, 2021
PDF	Page 149 middle paragraph	while we are discussing how to discriminate '3' and '7' in this chapter, this paragraph is talking about '8'. I guess '8' here should be '3'. and the syntax of the function for probability of being the number 8 is strange. def pr_eight(x, w) = (xw).sum() it should be something like, def pr_three(x, w): return (xw).sum() Note from the Author or Editor: For consistency with the previous page, I guess we can use 3s here, so replace all 8 by 3 (I counted five instances on this page) and pr_eight by pr_three.	HIDEMOTO NAKADA	Mar 19, 2021	May 07, 2021
PDF	Page 198 2nd paragraph, last line	"...would then look like something like Figure 5-3" should be "would then look something like Figure 5-3" Note from the Author or Editor: This should be fixed as suggested	ZHANG Hongyuan	Mar 03, 2021	May 07, 2021
Printed	Page 510 the last paragraph of the column	Here, SymPy has taken the derivative of x2 for us! 'x' should be 'sx', Here, SymPy has taken the derivative of sx2 for us! Note from the Author or Editor: Yes, the sentence should be fixed as proposed.	HIDEMOTO NAKADA	Feb 19, 2021	May 07, 2021
Printed	Page 521 3rd paragraph	"To do the dot product of our weight matrix (2 by number of activations) with the activations (batch size by activations by rows by cols), we use a custom einsum": the activation does not include the batch size. Note from the Author or Editor: Replace "(batch size by activations by rows by cols)" by "(batch size by rows by cols)"	HIDEMOTO NAKADA	Feb 19, 2021	May 07, 2021
Printed	Page 524 2nd code snippet	x.shape I cannot understand why we check the shape of x here. I guess it should be act.shape Note from the Author or Editor: p521 replace x.shape torch.Size([1, 3, 224, 224]) by act.shape torch.Size([512, 7, 7])	HIDEMOTO NAKADA	Feb 19, 2021	May 07, 2021
Printed	Page 532 last line	"Before we do, we’ll call a hook, if it’s defined. " 'Before' should be 'After', according to the code above. # I assume that by 'do' the authors mean 'call forward'. Note from the Author or Editor: replace the sentence "Before we do, we'll call a hook, if it's defined" by "After, we call a hook, if it's defined".	HIDEMOTO NAKADA	Feb 19, 2021	May 07, 2021
Printed	Page 428 -2nd paragraph	"except in camel_case." 'camel_case' should be 'snake_case' Note from the Author or Editor: Indeed, replace 'camel_case' with 'snake_case' in the second to last paragraph.	HIDEMOTO NAKADA	Feb 16, 2021	May 07, 2021
Printed	Page 335 in the middle list	xxunk Indicates the next word is unknown I understand this special token means the word is unknown. not the next word. Note from the Author or Editor: Yes, replace "Indicates the next word is unknown" by "Indicates this word is unknown"	HIDEMOTO NAKADA	Feb 16, 2021	May 07, 2021
Printed	Page 200 last paragraph	"So, we want to transform our numbers between 0 and 1 to instead be between negative infinity and infinity. " logarithm does not give us infinity for 1. it should be 'negative infinity and 0' . Note from the Author or Editor: Indeed, "negative infinity and infinity" should be replaced by "negative infinity and 0"	HIDEMOTO NAKADA	Feb 13, 2021	May 07, 2021
Printed	Page 173 3rd paragraph	"To decide if an output represents a 3 or a 7, we can just check whether it's greater than 0." I understand that the threshold here is 0.5, so 0 should be 0.5. Also, the code snippet just below this paragraph ---- (preds>0.0).float() == train_y[:4] ---- 0.0 should be 0.5 Note from the Author or Editor: Yes, in the paragraph mentioned, 0 should be replaced with 0.5 and in the code snippet immediately below, 0.0 should be 0.5.	HIDEMOTO NAKADA	Feb 08, 2021	May 07, 2021
Printed	Page 147 in the middle	tensor([1,2,3]) + tensor([1,1,1]) this does not make sense as an example. i guess it should be tensor([1,2,3]) + tensor(1) Note from the Author or Editor: Yes, this should be `tensor([1,2,3]) + tensor(1)`	HIDEMOTO NAKADA	Feb 08, 2021	May 07, 2021
Printed	Page 509 5th paragraph	"For the gradients of the ReLU and our linear layer, we use the gradients of the loss with respect to the output (in out.g) and apply the chain rule to compute the gradients of the loss with respect to the output (in inp.g)." the last 'the gradients of the loss with respect to the output (in inp.g)' is doubtful. 'output' should be 'input', or 'output of the previous layer'? Note from the Author or Editor: Replace "For the gradients of the ReLU and our linear layer, we use the gradients of the loss with respect to the output (in out.g) and apply the chain rule to compute the gradients of the loss with respect to the output (in inp.g)." by "For the gradients of the ReLU and our linear layer, we use the gradients of the loss with respect to the output (in out.g) and apply the chain rule to compute the gradients of the loss with respect to the input (in inp.g)."	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 505 l.3	"the scale of our activations will go from 1 to 0.1, and after 100 layers" the follwing code uses 50, not 100. Note from the Author or Editor: Replace 100 by 50	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 503 l.2	torch.einsum('bi,ij,bj->b', a, b, c) the char 'b' is used for mat and index char. this is quite confusing. Note from the Author or Editor: Replace "torch.einsum('bi,ij,bj->b', a, b, c)" by "torch.einsum('bi,ij,bj->b', x, y, z)" and later on "torch.einsum('bik,bkj->b', a, b)" by "torch.einsum('bik,bkj->b', x, y)"	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 502 rules of Einstein summation	"2. Each index can appear at most twice in any term." what do you mean by 'term' here? if you count right hand side, it can appear more than twice. In https://ajcr.net/Basic-guide-to-einsum/ we can see an example like 'i,i->i' "3. Each term must contain identical nonrepeated indices. " this is also doubtful according to the blog above. Note from the Author or Editor: Replace the rules by: 1. Repeated indices on the left side are implicitly summed over if they are not on the right side. 2. Each index can appear at most twice on the left side. 3. The unrepeated indices on the left side must appear on the right side.	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 501 -2	"Scale (1d tensor): (1) 256 x 256" it is 2d tensor, not 1d, i guess. Note from the Author or Editor: In the last paragraph of the page, replace 1d in "Scaled (1d tensor): (1) 256 x 256" by 2d.	HIDEMOTO NAKADA	Jan 16, 2021	Dec 18, 2020
Printed	Page 487 last list	CancelFitException and CancelBatchException the label and explanation do not match. CancelFitException and CancelBatchException should be exchanged. Note from the Author or Editor: In p487-488, switch the labels CancelFitException and CancelBatchException.	HIDEMOTO NAKADA	Jan 16, 2021	Dec 18, 2020
Printed	Page 465 just above 'x = self.emb_drop(x)'	"You can pass `emb_drop` to `__init__` to change this value:" i could not find init parameter emb_drop forTabularModel https://github.com/fastai/fastai/blob/master/fastai/tabular/model.py This should be "You can pass `embd_p` to `__init__` to change the dropout probabilty:" ? Note from the Author or Editor: Replace `emb_drop` by `embed_p` on p467, in the sentence "you can pass emb_drop to __init__ to change this value"	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 447 l.3	"where conv is the function from the previous chapter that adds a second convolution, then a ReLU, then a batchnorm layer" the conv in p.437 uses batchnorm before ReLU. Note from the Author or Editor: On p447 replace ""where conv is the function from the previous chapter that adds a second convolution, then a ReLU, then a batchnorm layer" by "where conv is the function from the previous chapter that adds a second convolution, then a batchnorm layer, then a ReLU"	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 433 1st paragraph	"The percentage of nonzero weights is getting much better", 'nonzero' should be 'near zero' Note from the Author or Editor: Replace nonzero by near-zero	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 401 Questionnaire 33.	"Why do we scale the weights with dropout?" I understand weights are not scaled. 'activations' are scaled. Note from the Author or Editor: In q33 replace "weights" by "activations"	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 399 last itemize	There are "Embedding dropout" and "Input dropout" are listed. What is the difference between them? Note from the Author or Editor: p 399 replace the first two items of the last list with - Embedding dropout (inside the embedding layer, drops some random lines of embeddings) - Input dropout (applied after the embedding layer)	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 341 2nd paragraph	"We then cut this stream into a certain number of batches (which is our batch size)." 'batch size' usually means 'size of each batch', not 'number of batches'. This is really confusing. Am I missing something? Note from the Author or Editor: In the mentioned sentence, replace "number of batches" by "number of chunks of contiguous text"	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 272 3rd paragraph	The term 'embedding matrices' used here means something different from the one used in p.268. I believe you are talking about the output of the embedding layer here. it looks like 'embedding' without 'matrices' is better here. Note from the Author or Editor: In that paragraph replace "embedding matrices" by "embeddings"	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 255 last line of the 2nd paragraph	'The Last Skywalker' I dont know if this is some kind of joke, but the star wars movie was 'The Last Jedi' or "The rise of Skywalker" Note from the Author or Editor: Replace "The Last Skylwalker" by "The Rise of Skylwalker" Replace "last_skywalker" in the next code examples by "rise_skywalker" (two instances)	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 143 l.1	"If you’ve done numeric programming in PyTorch before, you may recognize these as being similar to NumPy arrays." PyTorch should be NumPy. Note from the Author or Editor: Yes, replace PyTorch by NumPy in this sentence.	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
Printed	Page 28 -5	" and passing x[0].isupper(), which evaluates to True if the first letter is uppercase (i.e., it’s a cat)." The code is not passing 'x[0].isupper()', instead it passes a fucntion 'is_cat' Note from the Author or Editor: replace 'x[0].isupper()', by 'is_cat' in this sentence.	HIDEMOTO NAKADA	Jan 16, 2021	May 07, 2021
PDF	Page 149 The last two paragraphs	All the vector symbols X and W should be in lowercase in order to be consistent with the function above the second to last paragraph. Note from the Author or Editor: In the last two paragraphs of p149, replace all (code-formatted) 'W' by 'w' and 'X' by 'x'	ZHANG Hongyuan	Dec 18, 2020	May 07, 2021
Printed	Page 136 3rd paragraph	In the 3rd line of the 3rd paragraph. it should be " for a total of 784 pixels" not "768 pixels" ]. Change 768 to 784 Note from the Author or Editor: Please replace 768 by 784 as advised.	Mohammed Maheer	Oct 13, 2020	Dec 18, 2020
Printed	Page 364 Last code section	The class SiameseImage is derived from Tuple (note the capital T). The native Python datatype is written all in lower case 'tuple' but I think what's meant here is fastuple, the extended tuple in fastcore. I wonder if Tuple was renamed fastuple at some point. Note from the Author or Editor: Please replace all instance of "Tuple" (with the capital, code-formatted) by "fastuple" (I counted two on p364, one in the code of the class SiameseImage, one in the paragraph before and one on p365).	Nils Brünggel	Oct 13, 2020	Dec 18, 2020
Printed	Page 447 first paragraph	"What if we intitialized gamma to zero for every one of those final batchnorm layers?" Shouldn't beta also be intitialised to zero? Only then would the residual mapping be guaranteed to be zero at intitialisation. This is what seems to be done e.g. in https://arxiv.org/pdf/1901.09321.pdf Thanks for clearing it up! Note from the Author or Editor: Add a precision: "What if we initialized gamma to zero for every one of those final batchnorm layers? Since beta is already initialized to zero, our conv(x) for those..." with beta code-formatted	vallotton	Oct 05, 2020	Dec 18, 2020
Printed	Page 427 second paragraph	"We'll use the same as one as earlier..." should read: "We'll use the same one as earlier..." Note from the Author or Editor: Please make that modification.	pascal vallotton	Oct 04, 2020	Dec 18, 2020
Printed	Page 423 bottom figures	What are shown in these figures are not the red, green and blue channel of the original image. Instead, you show the same image three times using a different colormap. Indeed, the three images show exactly the same pattern of intensitiy, which they shouldn't (for example, if the channels were really shown, the green grass should appear more prominently in the green channel than in the red channel). Or am I missing something? Thanks a for a great book though! Note from the Author or Editor: Add a parenthesis: The first axis contains the channels red, green, and blue (here highlighted with the corresponding color maps):	pascal vallotton	Oct 04, 2020	Dec 18, 2020
Printed	Page 410 first paragraph	[channels_in, features_out, rows, columns] should be read: [features_out, channels_in, rows, columns] Indeed, a few lines later, the shape of the kernel is shown to be [4,1,3,3], which corresponds to my suggested correction, but not to the original print. Or am I missing something? Note from the Author or Editor: Please make the modification proposed by the reader.	vallotton	Oct 03, 2020	Dec 18, 2020
Printed	Page 28 5th paragraph	We have to tell fastai how to get labels from the filenames, which we do by calling from_name_func (which means that the filenames can be extracted using a function applied to the filename) .... That should say labels instead of filenames Note from the Author or Editor: Replace filenames by labels in the parenthesis: (which means that the labels can be extracted using a function applied to the filename)	John O'Reilly	Sep 13, 2020	Dec 18, 2020
Printed	Page 201 last paragraph (not the Sylvain says section)	modification should be multiplication in the following: "Computer scientists love using logarithms, because it means that modification, which can create really really large and really really small numbers, can be replaced by addition"	Peter Butterfill	Sep 08, 2020	Sep 18, 2020
Printed	Page 147 last paragraph	"we'll get back 1,010 absolute values" should read "we'll get back 1,010x28x28 absolute values" - (valid_3_tens-mean3).abs().shape -> torch.Size([1010, 28, 28]) Note from the Author or Editor: change to "get back 1,010 matrices of absolute values"	Peter Butterfill	Sep 08, 2020	Sep 18, 2020
Printed	Page 16 1st paragraph	There are two folders containing different versions of the notebooks. The full folder contains the exact notebooks used to create the book you're reading now, with all the propose and outputs. The stripped version has the same headings and code cells, but all outputs and prose has been removed. The folders are no longer called "full" and "stripped". The https://github.com/fastai/fastbook contains the "full" version of the book. The clean (https://github.com/fastai/fastbook/tree/master/clean) is the stripped version.	Andrew Nakamura	Aug 24, 2020	Sep 18, 2020