
Natural Language Processing with Transformers, Revised Edition

Errata for Natural Language Processing with Transformers, Revised Edition

ePub Page Notebook 02_classification.ipynb
Notebook 02_classification.ipynb

Notebook 02_classification.ipynb causes the following problem:

emotions_encoded.set_format("torch", columns=["input_ids", "attention_mask", "label"])

The exception report below says that PyTorch is not installed, which is not correct: (ValueError: PyTorch needs to be installed to be able to return PyTorch tensors).

Unexpected exception formatting exception. Falling back to standard exception

I found a workaround by running

emotions_encoded.set_format("torch“) first

and then

emotions_encoded.set_format("torch", columns=["input_ids", "attention_mask", "label"])

However, I have a different problem that I don’t know how to solve:

After logging into Hugging Face Hub, as instructed in the same notebook (and the textbook), and creating a write access token, I executed the cell:

from transformers import Trainer, TrainingArguments

batch_size = 64

logging_steps = len(emotions_encoded["train"]) // batch_size

model_name = f"{model_ckpt}-finetuned-emotion"

training_args = TrainingArguments(output_dir=model_name,











When I run the next cell

from transformers import Trainer

trainer = Trainer(model=model, args=training_args,






I get the ‘Repository not found’ error:

`git clone` has been updated in upstream Git to have comparable

speeds to `git lfs clone`.

Cloning into '.'...

remote: Repository not found

fatal: repository 'huggingface/kokaljfilipovic/distilbert-base-uncased-finetuned-emotion/' not found

Error(s) during clone:

`git clone` failed: exit status 128

Silvija Kokalj-Filipovic  Jul 24, 2023 
Printed Page Various, see the detail section
Various, see the detail section

o p. 252: github-issues-transfomers.jsonl
o p. 271: “This is example is about {}”
o p. 259: […] there are sophisticated methods than can leverage […]
o p. 300 Unlike the code in the others in this book…
o p. 302: writing tool or for a building a game.
o P. 342: in Chapter 5 --> Chapter 6

Anonymous  Sep 17, 2023 
Printed Page Chap. 10, 323-336

It appears that the architecture of GPT-2 is used. What is left a bit unclear is what is the input/output of the model being trained from scratch. My understanding is that GPT-2 is trained to predict (just) the next token. Here, it seems to be different as suggested by Figure10-2.

What is exactly the input-output behavior of the network while being trained? Does for each training example, the model mask the last few tokens of the input and sets the output for each training example as identical to the input? If so, where is this masking defined?

Please clarify.

Is the scheme different?

Anonymous  Oct 19, 2023 
Printed Page Pg 61
After 3rd paragraph

I'm not sure if it's an errata or not, but I have checked several sources and I believe the "self-attention" formula is slightly different.
The W_ji should be W_ij I think.

Cayetano Romero  Jan 12, 2024 
Printed Page Chapter 6
part of the code of this chapter

# hide
from transformers import pipeline, set_seed

It generates the warnings and error message below:

ARNING:tensorflow:From C:\Users\ziad.elmously.MLCORP\Anaconda3\envs\LargeLanguageModels\lib\site-packages\keras\src\ The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

Please note that I have already installed the required packages using the command:

!pip install -r requirements.txt

I am running the script in Jupyter Notebook.

Ziad Elmously  Jan 25, 2024 
Printed Page page 74
last chapter

Book Natural Language Processing mit Transformatoren (Deutsch)

training_args = TrainingArguments(output_dir=model_name,

I have installed `pip install transformers[torch]` and `pip install accelerate -U, but nevertheless I get an error.
What to do?
In this book are many errors in the notebooks and it is very laborious to work with this book. J. van der List

Juergen van der List  Jan 27, 2024 
Printed Page 6, Transfer Learning in NLP
1st paragraph, third sentence

The sentence: Architecturally, this involves, splitting the model into of a body and a head, ...
Should be without "of": Architecturally, this involves, splitting the model into a body and a head, ...

Velimir Graorkoski  Jan 08, 2024 
Printed Page 23
2nd code cell

I am getting this error (which sounds a lot like one what was thought to have been fixed):

FileNotFoundError: Couldn't find file at <link to dropbox file>


emotions = load_dataset("dair-ai/emotion")

Eric Cooper  Aug 15, 2023 
Printed Page 61

TypeError Traceback (most recent call last)
<ipython-input-92-195b9a5c839d> in <cell line: 1>()
----> 1 print(tokenize(emotions['train'][:2]))

<ipython-input-89-f9c17701f610> in tokenize(batch)
1 def tokenize(batch):
----> 2 return tokenizer(batch(['text'],padding = True,truncation=True))

TypeError: 'dict' object is not callable

Juergen van der List  Dec 29, 2023 
Printed Page 68 (German edition)
last paragraph

If I start this Python-block:
from umap import UMAP
from sklearn.preprocessing import MinMaxScaler

# Scale features to [0,1] range
X_scaled = MinMaxScaler().fit_transform(X_train)
# Initialize and fit UMAP
mapper = UMAP(n_components=2, metric="cosine").fit(X_scaled)
# Create a DataFrame of 2D embeddings
df_emb = pd.DataFrame(mapper.embedding_, columns=["X", "Y"])

I get the following error:
ImportError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_17500\ in <module>
----> 1 from umap import UMAP
2 from sklearn.preprocessing import MinMaxScaler
4 # Scale features to [0,1] range
5 X_scaled = MinMaxScaler().fit_transform(X_train)

ImportError: cannot import name 'UMAP' from 'umap' (C:\Users\vdl\Anaconda3\lib\site-packages\umap\
df_emb["label"] = y_train

But I have umap istalled with:
!pip install umap and the answer was:
Requirement already satisfied: umap in c:\users\vdl\anaconda3\lib\site-packages (0.1.1)

What to do? Best regards, JvdL

Juergen van der List  Jan 23, 2024 
Printed Page 69
1st paragraph, 3rd line

Hello! I suspect that 'hidden_dim]' should be 'embed_dim]', pls. verify (and thank you, great read so far, clearly explained).

Jo De Baer  Mar 06, 2024 
Printed Page 75 (german edition)
1rd paragraph

trainer = Trainer(model=model, args=training_args,

This raised the following error:
CalledProcessError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/ in clone_from(self, repo_url, token)
--> 669 run_subprocess(
670 # 'git lfs clone' is deprecated (will display a warning in the terminal)

8 frames
CalledProcessError: Command '['git', 'lfs', 'clone', 'xxx', '.']' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/ in clone_from(self, repo_url, token)
708 except subprocess.CalledProcessError as exc:
--> 709 raise EnvironmentError(exc.stderr)
711 def git_config_username_and_email(self, git_user: Optional[str] = None, git_email: Optional[str] = None):

OSError: WARNING: 'git lfs clone' is deprecated and will not be updated
with new flags from 'git clone'

'git clone' has been updated in upstream Git to have comparable
speeds to 'git lfs clone'.
Cloning into '.'...
remote: Repository not found
fatal: repository 'xxx' not found
Error(s) during clone:
git clone failed: exit status 128

Gunold Brunbauer  Jul 07, 2023 
Printed Page 132
Body of function sequence_logprob

In the definition of the function, the sum of log probabilities seq_log_prob is obtained by computing the following:
torch.sum(log_probs[:, input_len:])

I think the correct way to obtain the sum of log probabilities is to compute:
torch.sum(log_probs[:, input_len-1:])

This is because we also want to take into account the log probability of the first token that was generated and that directly follows the input sequence (this log probability is located at index input_len -1 since input sequence tokens are located from index 0 to index input_len-1).

Clément Luneau  Jul 19, 2023 
Printed Page 161
2nd paragraph

The sentence: "Then, as usual, we set up a the TrainingArguments for training:"
Word "a" is sufficient, should be removed.

Velimir Graorkoski  Jan 15, 2024 
Printed Page 173, Extracting Answers from Text
2nd paragraph, second sentence under the section Extracting Answers from Text

The sentence: "For example, if a we have a question like ..."
Should be (without the "a"): "For example, if a we have a question like ..."

Velimir Graorkoski   Jan 15, 2024 
Printed Page 220 (German Edition)
6th paragraph

The code: "!curl -X GET 'localhost:9200/?pretty' does not work with the actual edition of elasticsearch, but it works with the version 7.9.3

Gunold Brunbauer  Jul 31, 2023 
Printed Page 254
3rd paragraph

In 3rd paragraph,

"Next let's take a look at the top 10 most frequent~" should be "Next let's take a look at the top 8 most frequent~".


Haesun Park  Aug 29, 2022 
Printed Page 260
1st line

In header at first line,

"Implementing Naive Bayesline" should be "Implementing Naive Bayes" or "Implementing Naive Bayes baseline".


Haesun Park  Aug 29, 2022 
Printed Page 265, Working with No Labeled Data
Table 9-1

The name of the table contains MLNI acronym instead of MNLI.

Velimir Graorkoski   Jan 20, 2024 
Printed Page 269
13th line and 15th line from the bottom

In 13th line and 15th line from the bottom,

"Best threshold (micro)" should be "Best threshold (macro)".


Haesun Park  Aug 29, 2022 
Printed Page 271, Working with No Labeled Data
2nd paragraph from the end, 2nd bullet point

The hypothesis mentioned in the point states: "This is example is about".
One "is" is sufficient.

Velimir Graorkoski   Jan 21, 2024 
Printed Page 273
code block in the middle

In recent version of nlpaug, aug.augment(text) always returns a list even if n=1.
So, `text_aug += [aug.augment(text)]` should be `text_aug += aug.augment(text)`

Haesun Park  Oct 14, 2022 
Printed Page 274
4th line from the bottom

In 4th line from the bottom,

What does "around 5 point" mean? 5 percent point or anything else?
Please give me some explanation.


Haesun Park  Aug 29, 2022 
Printed Page 283
6th line from the bottom

In 6th line from the bottom,

"k/n elements to compare" should be "n/k elements to compare".


Haesun Park  Aug 30, 2022 
Printed Page 291
5th, 6th line from the top

In 5th, 6th line from the top,

original_input_ids and masked_input_ids are not defined.
I think they are inputs["input_ids"][0] and outputs["input_ids"][0].


Haesun Park  Aug 29, 2022 
Printed Page 308
6th line from the bottom

In 6th line from the bottom, "This reduces the memory footprint of our dataset from 180 GB to 50 GB".
So, when streaming codeparrot dataset, 50 GB memory is needed?


Haesun Park  Sep 08, 2022 
Printed Page 337
1st line from the top

In 1st line from the top,

I suggest to change "reduce pattern" to "all-reduce pattern" for clear explanation.


Haesun Park  Sep 08, 2022 
Printed Page 341
1st paragraph

In 1st paragraph, "it didn't quite get it right in the second attempt".
But second attempt is only correct answer.
Please let me know the meaning.


Haesun Park  Sep 08, 2022 
Printed Page 360
Last code block

In last code block,
`if` and `else` block have same code.

Haesun Park  Sep 13, 2022 
Printed Page 361
1st paragraph

In 1st paragraph,

"For the first chapter, the model predict..."
should be
"For the first question, the model predict..."


Haesun Park  Sep 13, 2022