Errata
The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update
| Version | Location | Description | Submitted by | Date submitted |
|---|---|---|---|---|
| Page Page 52, "Generating your First text" code section |
while trying to run the example using the model microsoft/Phi-3-mini-4k-instruct, I encountered the following error during execution: |
Pablo Garrido | May 31, 2025 | |
| Page 33 before the code "from transformers import AutoModelForCausalLM, AutoTokenizer" |
There will be an error for the code if you miss for following installation in the beginning. |
凌星寒 | Jun 21, 2025 | |
| Printed | Page 41 and 44 Figures 2-4 and 2-5 |
Figures 2-4 and 2-5 show the same token ID for different tokens: |
Pablo Francisco Pérez Hidalgo | Aug 11, 2025 |
| Printed | Page 49 Last 2 Bullet Points |
On Page 49, the last two points explaining the insights/observations of GPT-2 Tokenizer are exactly the same, now either by mistake the same point has been reprinted twice, or instead of one more valid insight/observation, it has been replaced with the previous one. Please check it out & fix it! |
Harshit Dawar | May 27, 2025 |
| Printed | Page 51 Last Paragraph for GPT-4 (Bullet Points to differentiate between GPT-4 & GPT-2 Tokenizer) |
On Page 51, in the section explaining observations of GPT-4 Tokenizer, in the 3rd point of that, its mentioned that word "tokens" has been represented by using 1 token, however, if you see the output of GPT-4 Tokenizer given just above the observations at the same page, the word "_tokens" has been marked as a single word. |
Harshit Dawar | May 27, 2025 |
| Printed | Page 52 13th line from the bottom |
'This is an encoder that forcuses on code generation' |
Haesun Park | Jun 21, 2025 |
| Page 54 Right before the recap |
At the end of page 53 you mentioned the Phi-3 (and Llama2) tokenizer and then explain its characteristics, but never show the actual result of the tokenization. |
Ivan Castano | Sep 16, 2025 | |
| Printed | Page 77 In Figure 3-5 |
In right upper fig. and right lower fig., token ID 50,000 should be 49,999. |
Haesun Park | Jun 21, 2025 |
| Printed | Page 90, 91 In Figure 3-16, 3-17 |
In two figures, 'combining information' seems to refer the Linear layer and included in attention head. But in figure 3-26, this Linear layer is shown separately from the attention heads. This could be confusing to readers. |
Haesun Park | Jun 21, 2025 |
| Page 91 Figure 3-17 |
Per description "Figure 3-17 shows the intuition of how attention heads run in parallel with a preceding step of splitting information and a later step of combining the results of all the heads." |
凌星寒 | Jun 24, 2025 | |
| Printed | Page 103 3th line from the bottom |
'that capture abolute and relative token position information' should be 'that capture relative token position information' |
Haesun Park | Jun 21, 2025 |
| Printed | Page 112 3th line from the top |
'both representation and langauge models' should be 'both representation and generative langauge models' |
Haesun Park | Jun 21, 2025 |
| Printed | Page 113 1st paragraph |
Here, it suggests evaluating generalization on the validation set when hyperparameter tuning is done using the training and test sets, but this is not the standard practice. Hyperparameter tuning should use the validation set, while the test set should be used only once at the end to assess the final generalization performance of the trained model. |
Haesun Park | Jun 21, 2025 |
| Printed | Page 125 In Figure 4-15 |
'The cosine similarity is the angle between two vectors' should be 'The cosine similarity is the cosine of the angle between two vectors' |
Haesun Park | Jun 21, 2025 |
| Printed | Page 129 In Figure 4-19 |
'a decoder-encoder architecture' should be 'a encoder-decoder architecture'. |
Haesun Park | Jun 21, 2025 |
| Printed | Page 163 Code snippet |
The correct code is: |
ERIC MEURVILLE | Jul 01, 2025 |
| Printed | Page 168 Figure 6-1 |
On Figure 6-1, the description of the LLama2 model is 7B/13B70B. It should be 7B/13B/70B (missing / between 13B and 70B). |
Theodoros Athanasiadis | Jun 19, 2025 |
| Printed | Page 179 Bottom line |
'adding it to the `data` variable' should be 'adding it to the `text` variable' |
Haesun Park | Jun 21, 2025 |
| Printed | Page 181 In Figure 6-13 |
The caption of Figure 6-13 is duplicated from that of Figure 6-11. |
Haesun Park | Jun 21, 2025 |
| Printed | Page 191 2nd line above 'Ouput Verification' section. |
'such a conservation' should be 'such a conversation' |
Haesun Park | Jun 21, 2025 |
| Printed | Page 201 figure 7-2 |
The float 16-bit representation is incorrect. |
ERIC MEURVILLE | Jul 01, 2025 |
| Printed | Page 204 2nd line below Figure 7-5 |
`system_prompt` is not included in the template. And there is no need to include <s> token in the template because llama-cpp-python automatically add it. |
Haesun Park | Jun 21, 2025 |
| Printed | Page 207 Top of the page |
llm_chain.invoke("a girl that lost her mother") |
Soner Balkir | Aug 24, 2025 |
| Printed | Page 210 In Figure 7-10 |
'Conversation history' should be 'Current conversation'. |
Haesun Park | Jun 21, 2025 |
| Printed | Page 213 Top of the page |
llm_chain.predict({"input_prompt":"What is 3 + 3?"}) |
Eric Meurville | Jul 10, 2025 |
| Printed | Page 216 Top of the page |
# Generate a conversation and ask for the name |
Eric Meurville | Jul 10, 2025 |
| Printed | Page 298 18th line from the bottom |
'similarity scores between 1 and 5' should be 'similarity scores between 0 and 5' |
Haesun Park | Jun 21, 2025 |
| Printed | Page 299 17th line from the top |
'during evaluation' should be 'during training' in explanation of per_device_train_batch_size. |
Haesun Park | Jun 21, 2025 |
| Printed | Page 327 12th line from the bottom |
`trainer.train()` is omitted. |
Haesun Park | Jun 21, 2025 |
| Printed | Page 354 3th line from the bottom |
'Using a two-step process' should be 'Using a three-step process'. |
Haesun Park | Jun 21, 2025 |
| Printed | Page 371 3rd last paragraph |
In the 'Training Configuration' section of 'Instruction Tuning with QLoRA' in chapter 12, it is stated regarding the 'num_train_epochs' parameter: |
Marcus Fraaß | May 16, 2025 |