Errata

Hands-On Large Language Models

Errata for Hands-On Large Language Models

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
PDF Page Page 52, "Generating your First text"
code section

while trying to run the example using the model microsoft/Phi-3-mini-4k-instruct, I encountered the following error during execution:

AttributeError: 'DynamicCache' object has no attribute 'get_max_length'. Did you mean: 'get_seq_length'?

I have already tried updating the transformers library to version 4.52.4, which should be compatible with the Phi-3 model. I also cleared the Hugging Face cache using huggingface-cli delete-cache and reinstalled all relevant packages. Despite these steps, the issue remains unresolved and the same error keeps appearing.

It seems like the issue is related to how the model handles past_key_values during text generation, particularly with DynamicCache.

Pablo Garrido   May 31, 2025 
PDF Page 33
before the code "from transformers import AutoModelForCausalLM, AutoTokenizer"

There will be an error for the code if you miss for following installation in the beginning.

!pip install -q transformers==4.41.2 accelerate

The default version of transformers now is 4.52.4, which will cause the following error:

AttributeError: 'DynamicCache' object has no attribute 'get_max_length'


凌星寒  Jun 21, 2025 
Printed Page 41 and 44
Figures 2-4 and 2-5

Figures 2-4 and 2-5 show the same token ID for different tokens:

the -> 278
b -> 278

A token identifier must be unique and the text is aligned with this idea, however the diagrams show as counter examples of this property.

Pablo Francisco Pérez Hidalgo  Aug 11, 2025 
Printed Page 49
Last 2 Bullet Points

On Page 49, the last two points explaining the insights/observations of GPT-2 Tokenizer are exactly the same, now either by mistake the same point has been reprinted twice, or instead of one more valid insight/observation, it has been replaced with the previous one. Please check it out & fix it!

Harshit Dawar  May 27, 2025 
Printed Page 51
Last Paragraph for GPT-4 (Bullet Points to differentiate between GPT-4 & GPT-2 Tokenizer)

On Page 51, in the section explaining observations of GPT-4 Tokenizer, in the 3rd point of that, its mentioned that word "tokens" has been represented by using 1 token, however, if you see the output of GPT-4 Tokenizer given just above the observations at the same page, the word "_tokens" has been marked as a single word.

Now, either of the two cases are possible:
1. "_tokens" is marked as 1 word by mistake in the output of GPT-4 Tokenizer, in reality, it should be 2 tokens i.e., "_" & "tokens".
2. Observation explanation is wrong, instead of mentioning "tokens" is represented using 1 token, it should be written as "_tokens" is represented using 1 token.

Please check this out & fix this!

Harshit Dawar  May 27, 2025 
Printed Page 52
13th line from the bottom

'This is an encoder that forcuses on code generation'
should be
'This is an decoder that forcuses on code generation'

Haesun Park  Jun 21, 2025 
PDF Page 54
Right before the recap

At the end of page 53 you mentioned the Phi-3 (and Llama2) tokenizer and then explain its characteristics, but never show the actual result of the tokenization.

It is only possible to see it in page 55 in the recap table.

So even if it is possible to see it, it breaks the flow of the reader and the structure of the book since it showed the tokenization result for all the other alternatives.

Ivan Castano  Sep 16, 2025 
Printed Page 77
In Figure 3-5

In right upper fig. and right lower fig., token ID 50,000 should be 49,999.

Haesun Park  Jun 21, 2025 
Printed Page 90, 91
In Figure 3-16, 3-17

In two figures, 'combining information' seems to refer the Linear layer and included in attention head. But in figure 3-26, this Linear layer is shown separately from the attention heads. This could be confusing to readers.

Haesun Park  Jun 21, 2025 
PDF Page 91
Figure 3-17

Per description "Figure 3-17 shows the intuition of how attention heads run in parallel with a preceding step of splitting information and a later step of combining the results of all the heads."

However, Figure 3-17 only shows one attention head and doesn't have the step of "combining the results of all the heads".

凌星寒  Jun 24, 2025 
Printed Page 103
3th line from the bottom

'that capture abolute and relative token position information' should be 'that capture relative token position information'

Haesun Park  Jun 21, 2025 
Printed Page 112
3th line from the top

'both representation and langauge models' should be 'both representation and generative langauge models'

Haesun Park  Jun 21, 2025 
Printed Page 113
1st paragraph

Here, it suggests evaluating generalization on the validation set when hyperparameter tuning is done using the training and test sets, but this is not the standard practice. Hyperparameter tuning should use the validation set, while the test set should be used only once at the end to assess the final generalization performance of the trained model.

Haesun Park  Jun 21, 2025 
Printed Page 125
In Figure 4-15

'The cosine similarity is the angle between two vectors' should be 'The cosine similarity is the cosine of the angle between two vectors'

Haesun Park  Jun 21, 2025 
Printed Page 129
In Figure 4-19

'a decoder-encoder architecture' should be 'a encoder-decoder architecture'.

Haesun Park  Jun 21, 2025 
Printed Page 163
Code snippet

The correct code is:

# Visualize topics and documents
fig = topic_model.visualize_document_datamap(
titles,
topics=list(range(20)),
reduced_embeddings=reduced_embeddings,
width=1200,
#label_font_size=11, # les paramètres label_font_size, label_wrap_width et use_medoids ne font pas partie des arguments officiels de visualize_document_datamap dans BERTopic
#label_wrap_width=20,
#use_medoids=True,
datamap_kwds={ # Use this dictionary for advanced settings instead parameters
"label_font_size": 11,
"label_wrap_width": 20,
"use_medoids": True,
},
)
plt.savefig("datamapplot.png", dpi=300)

ERIC MEURVILLE  Jul 01, 2025 
Printed Page 168
Figure 6-1

On Figure 6-1, the description of the LLama2 model is 7B/13B70B. It should be 7B/13B/70B (missing / between 13B and 70B).

Theodoros Athanasiadis  Jun 19, 2025 
Printed Page 179
Bottom line

'adding it to the `data` variable' should be 'adding it to the `text` variable'

Haesun Park  Jun 21, 2025 
Printed Page 181
In Figure 6-13

The caption of Figure 6-13 is duplicated from that of Figure 6-11.

Haesun Park  Jun 21, 2025 
Printed Page 191
2nd line above 'Ouput Verification' section.

'such a conservation' should be 'such a conversation'

Haesun Park  Jun 21, 2025 
Printed Page 201
figure 7-2

The float 16-bit representation is incorrect.

Float 16-bit (Half Precision) in IEEE 754 Format:
- Bit 15: Sign
- Bits 14–10: Exponent (5 bits and not 8!, biased by 15)
- Bits 9–0: Mantissa (10 bits, with implicit leading 1 for normalized numbers)

ERIC MEURVILLE  Jul 01, 2025 
Printed Page 204
2nd line below Figure 7-5

`system_prompt` is not included in the template. And there is no need to include <s> token in the template because llama-cpp-python automatically add it.

Haesun Park  Jun 21, 2025 
Printed Page 207
Top of the page

llm_chain.invoke("a girl that lost her mother")

should be

llm_chain.invoke({"summary" : "A girl that lost her mother"})

Soner Balkir  Aug 24, 2025 
Printed Page 210
In Figure 7-10

'Conversation history' should be 'Current conversation'.

Haesun Park  Jun 21, 2025 
Printed Page 213
Top of the page

llm_chain.predict({"input_prompt":"What is 3 + 3?"})

should be replaced by

llm_chain.invoke({"input_prompt":"What is 3 + 3?"})

Eric Meurville  Jul 10, 2025 
Printed Page 216
Top of the page

# Generate a conversation and ask for the name
llm_chain.invoke({"input_prompt": "Hi! My name is Maarten. What is 1 + 1?"})
llm_chain.invoke({"input_prompt": "What is my name?"})

should return

{'input_prompt': 'What is my name?',
'chat_history': ' Maarten introduces himself and asks for the sum of 1 + 1, to which the AI responds that it equals 2. The AI provides a brief explanation about addition being a basic arithmetic operation resulting from combining numbers, in this case adding one unit to another to get a total of two units or items.',
'text': " Your name was mentioned as Maarten when you introduced yourself; therefore, based on the current conversation, your name is Maarten.\nHere's an explanation for 1 + 1 = 2: Addition is one of the four fundamental arithmetic operations and involves combining quantities. When we add 1 unit to another 1 unit, we are essentially counting up by one from the first number (which is 1), arriving at a total of two units or items."}

Eric Meurville  Jul 10, 2025 
Printed Page 298
18th line from the bottom

'similarity scores between 1 and 5' should be 'similarity scores between 0 and 5'

Haesun Park  Jun 21, 2025 
Printed Page 299
17th line from the top

'during evaluation' should be 'during training' in explanation of per_device_train_batch_size.

Haesun Park  Jun 21, 2025 
Printed Page 327
12th line from the bottom

`trainer.train()` is omitted.

Haesun Park  Jun 21, 2025 
Printed Page 354
3th line from the bottom

'Using a two-step process' should be 'Using a three-step process'.

Haesun Park  Jun 21, 2025 
Printed Page 371
3rd last paragraph

In the 'Training Configuration' section of 'Instruction Tuning with QLoRA' in chapter 12, it is stated regarding the 'num_train_epochs' parameter:
'The total number of training rounds. Higher values tend to degrade performance so we generally like to keep this low.'

Don't higher values typically lead to better performance?

Marcus Fraaß  May 16, 2025