Errata

Feature Engineering for Machine Learning

Errata for Feature Engineering for Machine Learning

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
ePub Page 1
Figure 2-17

Just wanted to share with you something that I think is a mistake in Figure 2-17.

Ss it is said in the paragraph following the figure, this figure plots information in data space. So, if letter x refers to variables (which, from the formula placed in the figure, it does), the axis of this figure should not be labeled with x, but with the names of the observations (as done in Figure 2-2). Which should be effectively labelled as X1, X2, ... Xm are the dots.

Ramiro Heraclio  Apr 15, 2018 
Printed Page 3
3rd paragraph

Under the Features section, authors have mentioned, “A feature is a numeric representation of raw data.” Nevertheless, features can also be categorical.
Is it supposed to be “A feature is a representation of raw data.”

Manoj Jayabalan  Mar 11, 2019 
PDF Page 3
Last Paragraph

If this was an overview from 30,000 feet, just about scrapes the surface OK.

Misses the fact that the activitiy has a buisness purpose, in that there is a problem to solve and a target to show effectiveness, either against some existing technique or approach or to what the AI/ML solution is supposed to find out. It is fun to run the numbers but the point is to bring more efficient or better insight into a problem from the data.

Geoffrey Leigh  Jan 09, 2020 
PDF Page 61
3rd paragraph

In Chapter 4 an unusual variant of tf*idf is described - with tf being raw word counts. This is not a default in most definitions; adjusting tf by document length is much more common. I'm not aware of implementations where the presented tf*idf variant is a default; in scikit-learn, NLTK, gensim, ElasticSearch or "Recommended tf–idf weighting schemes" in Wikipedia tf is normalized by the document length by default.

This is not a problem on its own; the problem is that while analysis in Chapter 4 is valid for the presented tf*idf variant, it can't be generalized for more commonly used tf*idf variants. This makes statements about tf*idf misleading, e.g. "Tf-Idf = Column Scaling" is not true in most cases. More importantly, the analysis of tf*idf effect ("Deep Dive: What Is Happening?", p72-75) doesn't hold if one of the "default" tf*idf variants is used, as they're not just a column scaling.

On p65 there is a note: "Note that doing tf-idf then l2 normalization is the same as doing l2 normalization alone." While technically correct, it is then illustrated with a code sample: a result of
text.TfidfTransformer(norm=None).transform is column-wise L2-normalized. This is somewhat misleading, because the default text.TfidfTransformer(norm='l2') is doing a very different kind of normalization: rows are normalized, not columns. From the description it sounds like changing a default value of `norm` argument is just a way to get both normalized and unnormalized tf*idf results without calling TfidfTransformer twice, but that's not the case: default text.TfidfTransformer() is doing a completely different computation, which changes all the following analysis - scaling is not column-wise.

I think Chapter 4 should have used IDF scaling as an example, not TF*IDF, or make it clear that analysis doesn't hold for default/common TF*IDF implementations people will be using in practice.

Mikhail Korobov  Aug 28, 2018 
Printed Page 104
Equation 6-7, near the bottom

Objective function for principal components, matrix-vector formulation, should be

wtXtXw

(where "t" means "transpose") instead of

wtw

The same goes for Equation 6-8 in the next page.

Anonymous  Dec 11, 2020 
Printed Page 104
Equation 6-7

The objective function for principal components, matrix-vector formation should be maximizing w for z'z instead of w'w (where ' is used to indicate a transpose). Same with equation 6-8 on the next page (105)

You cannot maximize w for w'w much as w is constrained to be a unit vector.

Though it's most probably a typo, it changes the entire mathematical meaning of the objective function.

Anonymous  Feb 14, 2021 
PDF Page 136
last paragraph

Figure 8-3 illustrates examples ......
The center image contains vertical stripes; therefore, the horizontal gradient is zero.

To be changed:
therefore, the vertical gradient is zero.

* horizontal stripes >>> horizontal gradient is zero
* vertical stripes >>> vertical gradient is zero

Woohyun Kim  Aug 29, 2018 
PDF Page 163
1st row, 2nd row

The code does not work, as the defined feature_array on page 162 requires 3 arguments, but only 2 are provided, here.

Anonymous  Mar 06, 2021