Errata

Natural Language Processing with Transformers, Revised Edition

Errata for Natural Language Processing with Transformers, Revised Edition

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
ePub Page Notebook 02_classification.ipynb
Notebook 02_classification.ipynb

Notebook 02_classification.ipynb causes the following problem:

emotions_encoded.set_format("torch", columns=["input_ids", "attention_mask", "label"])

The exception report below says that PyTorch is not installed, which is not correct: (ValueError: PyTorch needs to be installed to be able to return PyTorch tensors).

Unexpected exception formatting exception. Falling back to standard exception

Traceback (most recent call last):
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_26043/2662095365.py", line 1, in <module>
emotions_encoded.set_format("torch",
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/dataset_dict.py", line 583, in set_format
writer_batch_size: Optional[int] = 1000,
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/fingerprint.py", line 511, in wrapper
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2515, in set_format
keep_in_memory=keep_in_memory,
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/formatting/__init__.py", line 128, in get_formatter
)
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_26043/304637329.py", line 1, in <module>
emotions_encoded.set_format("torch",
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/dataset_dict.py", line 583, in set_format
writer_batch_size: Optional[int] = 1000,
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/fingerprint.py", line 511, in wrapper
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2515, in set_format
keep_in_memory=keep_in_memory,
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/formatting/__init__.py", line 128, in get_formatter
)
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_26043/304637329.py", line 1, in <module>
emotions_encoded.set_format("torch",
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/dataset_dict.py", line 583, in set_format
writer_batch_size: Optional[int] = 1000,
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/fingerprint.py", line 511, in wrapper
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2515, in set_format
keep_in_memory=keep_in_memory,
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/formatting/__init__.py", line 128, in get_formatter
)
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_26043/304637329.py", line 1, in <module>
emotions_encoded.set_format("torch",
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/dataset_dict.py", line 583, in set_format
writer_batch_size: Optional[int] = 1000,
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/fingerprint.py", line 511, in wrapper
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2515, in set_format
keep_in_memory=keep_in_memory,
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/datasets/formatting/__init__.py", line 128, in get_formatter
)
ValueError: PyTorch needs to be installed to be able to return PyTorch tensors.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2105, in showtraceback
stb = self.InteractiveTB.structured_traceback(
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1396, in structured_traceback
return FormattedTB.structured_traceback(
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1287, in structured_traceback
return VerboseTB.structured_traceback(
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1140, in structured_traceback
formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1055, in format_exception_as_a_whole
frames.append(self.format_record(record))
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/ultratb.py", line 955, in format_record
frame_info.lines, Colors, self.has_colors, lvals
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/IPython/core/ultratb.py", line 778, in lines
return self._sd.lines
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/stack_data/core.py", line 734, in lines
pieces = self.included_pieces
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/stack_data/core.py", line 681, in included_pieces
pos = scope_pieces.index(self.executing_piece)
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/stack_data/core.py", line 660, in executing_piece
return only(
File "/home/silvija/nlp_notebooks/virtualENV/lib/python3.8/site-packages/executing/executing.py", line 190, in only
raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0

I found a workaround by running

emotions_encoded.set_format("torch“) first



and then

emotions_encoded.set_format("torch", columns=["input_ids", "attention_mask", "label"])





However, I have a different problem that I don’t know how to solve:



After logging into Hugging Face Hub, as instructed in the same notebook (and the textbook), and creating a write access token, I executed the cell:



from transformers import Trainer, TrainingArguments



batch_size = 64

logging_steps = len(emotions_encoded["train"]) // batch_size

model_name = f"{model_ckpt}-finetuned-emotion"

training_args = TrainingArguments(output_dir=model_name,

num_train_epochs=2,

learning_rate=2e-5,

per_device_train_batch_size=batch_size,

per_device_eval_batch_size=batch_size,

weight_decay=0.01,

evaluation_strategy="epoch",

disable_tqdm=False,

logging_steps=logging_steps,

push_to_hub=True,

log_level="error")





When I run the next cell

from transformers import Trainer



trainer = Trainer(model=model, args=training_args,

compute_metrics=compute_metrics,

train_dataset=emotions_encoded["train"],

eval_dataset=emotions_encoded["validation"],

tokenizer=tokenizer)

trainer.train();



I get the ‘Repository not found’ error:



`git clone` has been updated in upstream Git to have comparable

speeds to `git lfs clone`.

Cloning into '.'...

remote: Repository not found

fatal: repository 'huggingface/kokaljfilipovic/distilbert-base-uncased-finetuned-emotion/' not found

Error(s) during clone:

`git clone` failed: exit status 128

Silvija Kokalj-Filipovic  Jul 24, 2023 
Printed Page Various, see the detail section
Various, see the detail section

o p. 252: github-issues-transfomers.jsonl
o p. 271: “This is example is about {}”
o p. 259: […] there are sophisticated methods than can leverage […]
o p. 300 Unlike the code in the others in this book…
o p. 302: writing tool or for a building a game.
o P. 342: in Chapter 5 --> Chapter 6

Anonymous  Sep 17, 2023 
Printed Page Chap. 10, 323-336
all

It appears that the architecture of GPT-2 is used. What is left a bit unclear is what is the input/output of the model being trained from scratch. My understanding is that GPT-2 is trained to predict (just) the next token. Here, it seems to be different as suggested by Figure10-2.

What is exactly the input-output behavior of the network while being trained? Does for each training example, the model mask the last few tokens of the input and sets the output for each training example as identical to the input? If so, where is this masking defined?

Please clarify.

Is the scheme different?

Anonymous  Oct 19, 2023 
Printed Page Pg 61
After 3rd paragraph

I'm not sure if it's an errata or not, but I have checked several sources and I believe the "self-attention" formula is slightly different.
The W_ji should be W_ij I think.

Cayetano Romero  Jan 12, 2024 
Printed Page Chapter 6
part of the code of this chapter

# hide
from transformers import pipeline, set_seed

It generates the warnings and error message below:

ARNING:tensorflow:From C:\Users\ziad.elmously.MLCORP\Anaconda3\envs\LargeLanguageModels\lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
File ~\Anaconda3\envs\LargeLanguageModels\lib\site-packages\transformers\file_utils.py:2704, in _LazyModule._get_module(self, module_name)
2703 try:
-> 2704 return importlib.import_module("." + module_name, self.__name__)
2705 except Exception as e:

File ~\Anaconda3\envs\LargeLanguageModels\lib\importlib\__init__.py:127, in import_module(name, package)
126 level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1030, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1007, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:986, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:680, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:850, in exec_module(self, module)

File <frozen importlib._bootstrap>:228, in _call_with_frames_removed(f, *args, **kwds)

File ~\Anaconda3\envs\LargeLanguageModels\lib\site-packages\transformers\pipelines\__init__.py:54
53 from .question_answering import QuestionAnsweringArgumentHandler, QuestionAnsweringPipeline
---> 54 from .table_question_answering import TableQuestionAnsweringArgumentHandler, TableQuestionAnsweringPipeline
55 from .text2text_generation import SummarizationPipeline, Text2TextGenerationPipeline, TranslationPipeline

File ~\Anaconda3\envs\LargeLanguageModels\lib\site-packages\transformers\pipelines\table_question_answering.py:24
22 import tensorflow as tf
---> 24 import tensorflow_probability as tfp
26 from ..models.auto.modeling_tf_auto import TF_MODEL_FOR_TABLE_QUESTION_ANSWERING_MAPPING

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\__init__.py:20
17 # Contributors to the `python/` dir should not alter this file; instead update
18 # `python/__init__.py` as necessary.
---> 20 from tensorflow_probability import substrates
21 # from tensorflow_probability.google import staging # DisableOnExport
22 # from tensorflow_probability.google import tfp_google # DisableOnExport

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\substrates\__init__.py:17
15 """TensorFlow Probability alternative substrates."""
---> 17 from tensorflow_probability.python.internal import all_util
18 from tensorflow_probability.python.internal import lazy_loader # pylint: disable=g-direct-tensorflow-import

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\__init__.py:138
137 for pkg_name in _maybe_nonlazy_load:
--> 138 dir(globals()[pkg_name]) # Forces loading the package from its lazy loader.
141 all_util.remove_undocumented(__name__, _lazy_load + _maybe_nonlazy_load)

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\internal\lazy_loader.py:57, in LazyLoader.__dir__(self)
56 def __dir__(self):
---> 57 module = self._load()
58 return dir(module)

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\internal\lazy_loader.py:40, in LazyLoader._load(self)
39 # Import the target module and insert it into the parent's namespace
---> 40 module = importlib.import_module(self.__name__)
41 if self._parent_module_globals is not None:

File ~\Anaconda3\envs\LargeLanguageModels\lib\importlib\__init__.py:127, in import_module(name, package)
126 level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\experimental\__init__.py:31
30 from tensorflow_probability.python.experimental import auto_batching
---> 31 from tensorflow_probability.python.experimental import bijectors
32 from tensorflow_probability.python.experimental import distribute

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\experimental\bijectors\__init__.py:17
15 """TensorFlow Probability experimental bijectors package."""
---> 17 from tensorflow_probability.python.bijectors.ldj_ratio import forward_log_det_jacobian_ratio
18 from tensorflow_probability.python.bijectors.ldj_ratio import inverse_log_det_jacobian_ratio

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\bijectors\__init__.py:19
17 # pylint: disable=unused-import,wildcard-import,line-too-long,g-importing-member
---> 19 from tensorflow_probability.python.bijectors.absolute_value import AbsoluteValue
20 from tensorflow_probability.python.bijectors.ascending import Ascending

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\bijectors\absolute_value.py:19
17 import tensorflow.compat.v2 as tf
---> 19 from tensorflow_probability.python.bijectors import bijector
20 from tensorflow_probability.python.internal import assert_util

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\bijectors\bijector.py:26
25 from tensorflow_probability.python.internal import auto_composite_tensor
---> 26 from tensorflow_probability.python.internal import batch_shape_lib
27 from tensorflow_probability.python.internal import cache_util

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\internal\batch_shape_lib.py:23
21 import tensorflow.compat.v2 as tf
---> 23 from tensorflow_probability.python.internal import prefer_static as ps
24 from tensorflow_probability.python.internal import tensor_util

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\internal\prefer_static.py:26
25 from tensorflow_probability.python.internal import tensorshape_util
---> 26 from tensorflow_probability.python.internal.backend import numpy as nptf
28 from tensorflow.python.framework import ops # pylint: disable=g-direct-tensorflow-import

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\internal\backend\numpy\__init__.py:18
17 from tensorflow_probability.python.internal.backend.numpy import __internal__
---> 18 from tensorflow_probability.python.internal.backend.numpy import bitwise
19 from tensorflow_probability.python.internal.backend.numpy import compat

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\internal\backend\numpy\bitwise.py:19
17 import numpy as np
---> 19 from tensorflow_probability.python.internal.backend.numpy import _utils as utils
21 __all__ = [
22 'bitwise_xor',
23 'left_shift',
24 ]

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\internal\backend\numpy\_utils.py:22
21 import numpy as np
---> 22 from tensorflow_probability.python.internal.backend.numpy import nest
24 try:

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow_probability\python\internal\backend\numpy\nest.py:30
29 import types
---> 30 import tree as dm_tree
32 # pylint: disable=g-import-not-at-top

ModuleNotFoundError: No module named 'tree'

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
Cell In[1], line 2
1 # hide
----> 2 from transformers import pipeline, set_seed

File <frozen importlib._bootstrap>:1055, in _handle_fromlist(module, fromlist, import_, recursive)

File ~\Anaconda3\envs\LargeLanguageModels\lib\site-packages\transformers\file_utils.py:2694, in _LazyModule.__getattr__(self, name)
2692 value = self._get_module(name)
2693 elif name in self._class_to_module.keys():
-> 2694 module = self._get_module(self._class_to_module[name])
2695 value = getattr(module, name)
2696 else:

File ~\Anaconda3\envs\LargeLanguageModels\lib\site-packages\transformers\file_utils.py:2706, in _LazyModule._get_module(self, module_name)
2704 return importlib.import_module("." + module_name, self.__name__)
2705 except Exception as e:
-> 2706 raise RuntimeError(
2707 f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its traceback):\n{e}"
2708 ) from e

RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
No module named 'tree'

Please note that I have already installed the required packages using the command:

!pip install -r requirements.txt

I am running the script in Jupyter Notebook.

Ziad Elmously  Jan 25, 2024 
Printed Page page 74
last chapter

Book Natural Language Processing mit Transformatoren (Deutsch)

training_args = TrainingArguments(output_dir=model_name,
num_train_epochs=2,
learning_rate=2e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
evaluation_strategy="epoch",
disable_tqdm=False,
logging_steps=logging_steps,
push_to_hub=True,
log_level="error")

I have installed `pip install transformers[torch]` and `pip install accelerate -U, but nevertheless I get an error.
What to do?
In this book are many errors in the notebooks and it is very laborious to work with this book. J. van der List

Juergen van der List  Jan 27, 2024 
Printed Page 6, Transfer Learning in NLP
1st paragraph, third sentence

The sentence: Architecturally, this involves, splitting the model into of a body and a head, ...
Should be without "of": Architecturally, this involves, splitting the model into a body and a head, ...

Velimir Graorkoski  Jan 08, 2024 
Printed Page 23
2nd code cell

I am getting this error (which sounds a lot like one what was thought to have been fixed):



FileNotFoundError: Couldn't find file at <link to dropbox file>



WORK-AROUND:



emotions = load_dataset("dair-ai/emotion")

Eric Cooper  Aug 15, 2023 
Printed Page 61
print(tokenize(emotions['train'][:2]

TypeError Traceback (most recent call last)
<ipython-input-92-195b9a5c839d> in <cell line: 1>()
----> 1 print(tokenize(emotions['train'][:2]))

<ipython-input-89-f9c17701f610> in tokenize(batch)
1 def tokenize(batch):
----> 2 return tokenizer(batch(['text'],padding = True,truncation=True))

TypeError: 'dict' object is not callable

Juergen van der List  Dec 29, 2023 
Printed Page 68 (German edition)
last paragraph

If I start this Python-block:
from umap import UMAP
from sklearn.preprocessing import MinMaxScaler

# Scale features to [0,1] range
X_scaled = MinMaxScaler().fit_transform(X_train)
# Initialize and fit UMAP
mapper = UMAP(n_components=2, metric="cosine").fit(X_scaled)
# Create a DataFrame of 2D embeddings
df_emb = pd.DataFrame(mapper.embedding_, columns=["X", "Y"])

I get the following error:
ImportError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_17500\2438143818.py in <module>
----> 1 from umap import UMAP
2 from sklearn.preprocessing import MinMaxScaler
3
4 # Scale features to [0,1] range
5 X_scaled = MinMaxScaler().fit_transform(X_train)

ImportError: cannot import name 'UMAP' from 'umap' (C:\Users\vdl\Anaconda3\lib\site-packages\umap\__init__.py)
df_emb["label"] = y_train

But I have umap istalled with:
!pip install umap and the answer was:
Requirement already satisfied: umap in c:\users\vdl\anaconda3\lib\site-packages (0.1.1)

What to do? Best regards, JvdL

Juergen van der List  Jan 23, 2024 
Printed Page 69
1st paragraph, 3rd line

Hello! I suspect that 'hidden_dim]' should be 'embed_dim]', pls. verify (and thank you, great read so far, clearly explained).

Jo De Baer  Mar 06, 2024 
Printed Page 75 (german edition)
1rd paragraph

trainer = Trainer(model=model, args=training_args,
compute_metrics=compute_metrics,
train_dataset=emotions_encoded["train"],
eval_dataset=emotions_encoded["validation"],
tokenizer=tokenizer)
trainer.train();

This raised the following error:
CalledProcessError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/repository.py in clone_from(self, repo_url, token)
668
--> 669 run_subprocess(
670 # 'git lfs clone' is deprecated (will display a warning in the terminal)

8 frames
CalledProcessError: Command '['git', 'lfs', 'clone', 'xxx', '.']' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/repository.py in clone_from(self, repo_url, token)
707
708 except subprocess.CalledProcessError as exc:
--> 709 raise EnvironmentError(exc.stderr)
710
711 def git_config_username_and_email(self, git_user: Optional[str] = None, git_email: Optional[str] = None):

OSError: WARNING: 'git lfs clone' is deprecated and will not be updated
with new flags from 'git clone'

'git clone' has been updated in upstream Git to have comparable
speeds to 'git lfs clone'.
Cloning into '.'...
remote: Repository not found
fatal: repository 'xxx' not found
Error(s) during clone:
git clone failed: exit status 128

Gunold Brunbauer  Jul 07, 2023 
Printed Page 132
Body of function sequence_logprob

In the definition of the function, the sum of log probabilities seq_log_prob is obtained by computing the following:
torch.sum(log_probs[:, input_len:])

I think the correct way to obtain the sum of log probabilities is to compute:
torch.sum(log_probs[:, input_len-1:])

This is because we also want to take into account the log probability of the first token that was generated and that directly follows the input sequence (this log probability is located at index input_len -1 since input sequence tokens are located from index 0 to index input_len-1).

Clément Luneau  Jul 19, 2023 
Printed Page 161
2nd paragraph

The sentence: "Then, as usual, we set up a the TrainingArguments for training:"
Word "a" is sufficient, should be removed.

Velimir Graorkoski  Jan 15, 2024 
Printed Page 173, Extracting Answers from Text
2nd paragraph, second sentence under the section Extracting Answers from Text

The sentence: "For example, if a we have a question like ..."
Should be (without the "a"): "For example, if a we have a question like ..."

Velimir Graorkoski   Jan 15, 2024 
Printed Page 220 (German Edition)
6th paragraph

The code: "!curl -X GET 'localhost:9200/?pretty' does not work with the actual edition of elasticsearch, but it works with the version 7.9.3

Gunold Brunbauer  Jul 31, 2023 
Printed Page 254
3rd paragraph

In 3rd paragraph,

"Next let's take a look at the top 10 most frequent~" should be "Next let's take a look at the top 8 most frequent~".

Thanks

Haesun Park  Aug 29, 2022 
Printed Page 260
1st line

In header at first line,

"Implementing Naive Bayesline" should be "Implementing Naive Bayes" or "Implementing Naive Bayes baseline".

Thanks.

Haesun Park  Aug 29, 2022 
Printed Page 265, Working with No Labeled Data
Table 9-1

The name of the table contains MLNI acronym instead of MNLI.

Velimir Graorkoski   Jan 20, 2024 
Printed Page 269
13th line and 15th line from the bottom

In 13th line and 15th line from the bottom,

"Best threshold (micro)" should be "Best threshold (macro)".

Thanks.

Haesun Park  Aug 29, 2022 
Printed Page 271, Working with No Labeled Data
2nd paragraph from the end, 2nd bullet point

The hypothesis mentioned in the point states: "This is example is about".
One "is" is sufficient.

Velimir Graorkoski   Jan 21, 2024 
Printed Page 273
code block in the middle

In recent version of nlpaug, aug.augment(text) always returns a list even if n=1.
So, `text_aug += [aug.augment(text)]` should be `text_aug += aug.augment(text)`
Thanks

Haesun Park  Oct 14, 2022 
Printed Page 274
4th line from the bottom

In 4th line from the bottom,

What does "around 5 point" mean? 5 percent point or anything else?
Please give me some explanation.

Thanks.

Haesun Park  Aug 29, 2022 
Printed Page 283
6th line from the bottom

In 6th line from the bottom,

"k/n elements to compare" should be "n/k elements to compare".

Thanks.

Haesun Park  Aug 30, 2022 
Printed Page 291
5th, 6th line from the top

In 5th, 6th line from the top,

original_input_ids and masked_input_ids are not defined.
I think they are inputs["input_ids"][0] and outputs["input_ids"][0].

Thanks.

Haesun Park  Aug 29, 2022 
Printed Page 308
6th line from the bottom

In 6th line from the bottom, "This reduces the memory footprint of our dataset from 180 GB to 50 GB".
So, when streaming codeparrot dataset, 50 GB memory is needed?

Thanks.

Haesun Park  Sep 08, 2022 
Printed Page 337
1st line from the top

In 1st line from the top,

I suggest to change "reduce pattern" to "all-reduce pattern" for clear explanation.

Thanks

Haesun Park  Sep 08, 2022 
Printed Page 341
1st paragraph

In 1st paragraph, "it didn't quite get it right in the second attempt".
But second attempt is only correct answer.
Please let me know the meaning.

Thanks.

Haesun Park  Sep 08, 2022 
Printed Page 360
Last code block

In last code block,
`if` and `else` block have same code.
Thanks.

Haesun Park  Sep 13, 2022 
Printed Page 361
1st paragraph

In 1st paragraph,

"For the first chapter, the model predict..."
should be
"For the first question, the model predict..."

Thanks

Haesun Park  Sep 13, 2022