Errata

Reinforcement Learning

Errata for Reinforcement Learning

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Page ?
Chapter 2, Running the Experiment, 4th paragraph

The asymptote of the optimal action is given as:

1/n + (1-e)/n = (2-e)/n

Perhaps I am missing something, but I think the asymptotic behavior should be:

e/n + (1-e) = (e(1-n) + n)/n

which I got from algorithm 2-1:

e-proportion of the time, we pick the optimum solution with a probability of 1/n

During the remaining (1-e) proportion of the time, we always pick the optimum solution (in the long-running asymptote).

These two equations [ (2-e)/n or (e(1-n) +n)/n ] will give very different results. For example, if n = 10, the result for e -> 0 (perfect exploitation) for the given equation is 1/5 while my equation always achieves 1, a perfect selection of the optimal solution, irrespective of the number of actions possible.

Also, a more minor detail -- line 7 of Algorithm 2-1, I think the denominator should be N(a) (lower case a, not capital A), to denote we are dividing by the number of times the action was taken, not the total number of actions taken so far)

Note from the Author or Editor:
Page 30, Algorithm 2-1:

In step 7, replace "N(A)" with "N(a)" -- lowercase a.

Page 32, last paragraph which begins with: "Looking toward the end of the experiment..."

The equation near the bottom should be replaced. The text should read: "The asymptote of the optimal action is e/n + (1-e), where ..."

Kenji Oman  Dec 22, 2020  Jan 13, 2023
Page p.48
Figure 2-8

The "Right" and "Left" labels in Figure 2-8 need to be reveresed.

Note from the Author or Editor:
Page 48, Figure 2-8.

The "Right" and "Left" labels on all of the four images are the wrong way around. "Left" should be on the top, "Right" should be on the bottom.

Andrew  Mar 29, 2021  Jan 13, 2023
Page Prediction Error
Chapter 1 -> Fundamental Concepts in Reinforcement Learning -> The First RL Algorithm -> Prediction error

The sentence “Knowledge of the previous state and the prediction error helps alter the weights. Multiplying these together, the result is δx(s)=[0,1]. Adding this to the current weights yields w=[1,0].”

I think the result of this formula `δx(s)` should be [0,-1] instead of [0,1] since the prior sentence says, “The value of Equation 1-2, δ, is equal to −1”. Considering the state x(s) = [0,1], multiplying δx(s) would yield [0,-1]. Then it would make sense that adding [0,-1] to the prior weights w = [1,1] to yield the new weights w = [1,0].

Note from the Author or Editor:
Page 15, the sentence that currently reads "Multiplying these together, the result is δx(s)=[0,1]."

Should be: "Multiplying these together, the result is δx(s)=[0,-1]."

Note the minus sign at the end.

Nhan Tran  Dec 28, 2022  Jan 13, 2023
Printed
Page 29
Equation 2-3

Equation 2-3 should be: "r = r + α (r'-r)" -- note the ' should be on the first r.

Phil Winder
Phil Winder
 
Jan 02, 2023  Jan 13, 2023
Page 54
Algorithm 2-4

The algorithm exits the loop when DELTA is less than or equal to theta, but DELTA is always calculated as:

DELTA <- max(DELTA, (anything))

DELTA will never get smaller than its initial value. If that initial value is greater than theta, the algorithm will never exit its loop.

Note from the Author or Editor:
Page 54, Algorithm 2-4:

1. From the end of step 2, remove ", ∇ ← 0"
2. Insert a new step between 3 and 4 - let's call it 3a so that the references in the text remain correct. Insert: "3.a ∇ ← 0" and indent to align with the word "loop" on line 4.

Patrick Doyle  Jun 04, 2022  Jan 13, 2023
Page 62, 63, 64
(page 62) Equation 3-5, (page 63) Algorithm 3-1, (page 64) 2nd Paragraph 2nd Line

In the Q-Learning formula the argmax should be just max.

Note from the Author or Editor:
(page 62) Equation 3-5, (page 63) Algorithm 3-1, (page 64) 2nd Paragraph 2nd Line

Replace "argmax" with "max"

Manuel  Mar 15, 2021  Jan 13, 2023
Page 65
Algorithm 3-2

Step 6 states:

Choose a from s using pi, breaking ties randomly

Since this is in a loop, the value of "a" updated at the end of the loop will be obliterated by choosing a new value for "a".

Note from the Author or Editor:
Page 65, Algorithm 3-2:

1. Change step 4 to say "s, a ← Initialize s from the environment and choose a using π"
2. Remove step 6 entirely
3. Update all subsequent numbers to be contiguous

Patrick Doyle  Jun 05, 2022  Jan 13, 2023
Page 123
Algorithm 5-1, step 7

Missing ln when calculating the gradient of π - it should have been:
θ ← θ + αγ^tG∇lnπ(a ∣ s, θ)

Note from the Author or Editor:
Page 123, Algorithm 5-1:

Add "ln" to step 7, so that the equation reads as: "θ ← θ + αγ^tG∇lnπ(a ∣ s, θ)"

Page 125, Algorithm 5-2:

Add "ln" to step 9, so that the equation reads as: "θ ← θ + αγ^t????∇lnπ(a ∣ s, θ)"

Anonymous  Aug 08, 2022  Jan 13, 2023
Page 129
Algorithm 5-3

1. Variable t isn't being updated after each step.
2. At step 6. there's no need to break ties randomly, since we aren't dealing with a deterministic action-value function, but with a stochastic policy that outputs probabilities.
3. At step 8. V(s, θ) should have been V(s, w) (weights "w" belong to the critic model V, while weights "θ" belong to the actor model π, as denoted in step 1.).

Similar errors appear at page 134 (Algorithm 5-4) at the corresponding steps (6 and 8).

Note from the Author or Editor:
On page 129, Algorithm 5-3:
1. Add a 13th step to update t: "t <- t + 1", indent to align with line 12
2. Step 6: Remove ", breaking ties randomly" from the text
3. Step 8: change "V(s, θ)" to "V(s, w)"

On page 134, Algorithm 5-4:
1. Add a 15th step to update t: "t <- t + 1", indent to align with line 12
2. Step 6: Remove ", breaking ties randomly" from the text
3. Step 8: at the end of the line change "V(s, θ)" to "V(s, w)"

Anonymous  Aug 09, 2022  Jan 13, 2023