Errata

Errata for Reinforcement Learning

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
	Page ? Chapter 2, Running the Experiment, 4th paragraph	The asymptote of the optimal action is given as: 1/n + (1-e)/n = (2-e)/n Perhaps I am missing something, but I think the asymptotic behavior should be: e/n + (1-e) = (e(1-n) + n)/n which I got from algorithm 2-1: e-proportion of the time, we pick the optimum solution with a probability of 1/n During the remaining (1-e) proportion of the time, we always pick the optimum solution (in the long-running asymptote). These two equations [ (2-e)/n or (e(1-n) +n)/n ] will give very different results. For example, if n = 10, the result for e -> 0 (perfect exploitation) for the given equation is 1/5 while my equation always achieves 1, a perfect selection of the optimal solution, irrespective of the number of actions possible. Also, a more minor detail -- line 7 of Algorithm 2-1, I think the denominator should be N(a) (lower case a, not capital A), to denote we are dividing by the number of times the action was taken, not the total number of actions taken so far) Note from the Author or Editor: Page 30, Algorithm 2-1: In step 7, replace "N(A)" with "N(a)" -- lowercase a. Page 32, last paragraph which begins with: "Looking toward the end of the experiment..." The equation near the bottom should be replaced. The text should read: "The asymptote of the optimal action is e/n + (1-e), where ..."	Kenji Oman	Dec 22, 2020	Jan 13, 2023
	Page p.48 Figure 2-8	The "Right" and "Left" labels in Figure 2-8 need to be reveresed. Note from the Author or Editor: Page 48, Figure 2-8. The "Right" and "Left" labels on all of the four images are the wrong way around. "Left" should be on the top, "Right" should be on the bottom.	Andrew	Mar 29, 2021	Jan 13, 2023
	Page Prediction Error Chapter 1 -> Fundamental Concepts in Reinforcement Learning -> The First RL Algorithm -> Prediction error	The sentence “Knowledge of the previous state and the prediction error helps alter the weights. Multiplying these together, the result is δx(s)=[0,1]. Adding this to the current weights yields w=[1,0].” I think the result of this formula `δx(s)` should be [0,-1] instead of [0,1] since the prior sentence says, “The value of Equation 1-2, δ, is equal to −1”. Considering the state x(s) = [0,1], multiplying δx(s) would yield [0,-1]. Then it would make sense that adding [0,-1] to the prior weights w = [1,1] to yield the new weights w = [1,0]. Note from the Author or Editor: Page 15, the sentence that currently reads "Multiplying these together, the result is δx(s)=[0,1]." Should be: "Multiplying these together, the result is δx(s)=[0,-1]." Note the minus sign at the end.	Nhan Tran	Dec 28, 2022	Jan 13, 2023
Printed	Page 29 Equation 2-3	Equation 2-3 should be: "r = r + α (r'-r)" -- note the ' should be on the first r.	Phil Winder	Jan 02, 2023	Jan 13, 2023
	Page 54 Algorithm 2-4	The algorithm exits the loop when DELTA is less than or equal to theta, but DELTA is always calculated as: DELTA <- max(DELTA, (anything)) DELTA will never get smaller than its initial value. If that initial value is greater than theta, the algorithm will never exit its loop. Note from the Author or Editor: Page 54, Algorithm 2-4: 1. From the end of step 2, remove ", ∇ ← 0" 2. Insert a new step between 3 and 4 - let's call it 3a so that the references in the text remain correct. Insert: "3.a ∇ ← 0" and indent to align with the word "loop" on line 4.	Patrick Doyle	Jun 04, 2022	Jan 13, 2023
	Page 62, 63, 64 (page 62) Equation 3-5, (page 63) Algorithm 3-1, (page 64) 2nd Paragraph 2nd Line	In the Q-Learning formula the argmax should be just max. Note from the Author or Editor: (page 62) Equation 3-5, (page 63) Algorithm 3-1, (page 64) 2nd Paragraph 2nd Line Replace "argmax" with "max"	Manuel	Mar 15, 2021	Jan 13, 2023
	Page 65 Algorithm 3-2	Step 6 states: Choose a from s using pi, breaking ties randomly Since this is in a loop, the value of "a" updated at the end of the loop will be obliterated by choosing a new value for "a". Note from the Author or Editor: Page 65, Algorithm 3-2: 1. Change step 4 to say "s, a ← Initialize s from the environment and choose a using π" 2. Remove step 6 entirely 3. Update all subsequent numbers to be contiguous	Patrick Doyle	Jun 05, 2022	Jan 13, 2023
	Page 123 Algorithm 5-1, step 7	Missing ln when calculating the gradient of π - it should have been: θ ← θ + αγ^tG∇lnπ(a ∣ s, θ) Note from the Author or Editor: Page 123, Algorithm 5-1: Add "ln" to step 7, so that the equation reads as: "θ ← θ + αγ^tG∇lnπ(a ∣ s, θ)" Page 125, Algorithm 5-2: Add "ln" to step 9, so that the equation reads as: "θ ← θ + αγ^t????∇lnπ(a ∣ s, θ)"	Anonymous	Aug 08, 2022	Jan 13, 2023
	Page 129 Algorithm 5-3	1. Variable t isn't being updated after each step. 2. At step 6. there's no need to break ties randomly, since we aren't dealing with a deterministic action-value function, but with a stochastic policy that outputs probabilities. 3. At step 8. V(s, θ) should have been V(s, w) (weights "w" belong to the critic model V, while weights "θ" belong to the actor model π, as denoted in step 1.). Similar errors appear at page 134 (Algorithm 5-4) at the corresponding steps (6 and 8). Note from the Author or Editor: On page 129, Algorithm 5-3: 1. Add a 13th step to update t: "t <- t + 1", indent to align with line 12 2. Step 6: Remove ", breaking ties randomly" from the text 3. Step 8: change "V(s, θ)" to "V(s, w)" On page 134, Algorithm 5-4: 1. Add a 15th step to update t: "t <- t + 1", indent to align with line 12 2. Step 6: Remove ", breaking ties randomly" from the text 3. Step 8: at the end of the line change "V(s, θ)" to "V(s, w)"	Anonymous	Aug 09, 2022	Jan 13, 2023