Errata

Errata for Natural Language Annotation for Machine Learning

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 132 second paragraph	current: "As for the Fleiss's Kappa score...well, that's definitely one that would have to be revisited. But why, and what should be changed to get better agreement? Looking at the chart again, we can see that there's a lot of variation in all of the columns?while Annotators A, B, and D all have nearly the same number of positive reviews, those are the only values that come anywhere close to grouping. Annotator E is so far off from everyone else that if this were a real task, we?d be wondering if she got hold of a completely different set of guidelines!" corrected: "As for the Fleiss's Kappa score... well, that's definitely one that would havee to be revisited. Looking at the chart again, we can see that there's a lot of variation in all of the columns--in face, none of the reviews seem to have any real sense of agreement. There are a lot of factors that can influence a crowdsourced project, such as not being able to train annotators, not being able to ensure that the annotators meet certain guidelines (such as native language), or just sheer online mischief; see Chapter 12 for discussion of some of the most common platforms and pitfalls. Of course, our example was made up, but agreement scores like these definitely mean that you need to review your annotation guidelines, and probably your dataset as well."	Amber Stubbs	Mar 01, 2013	Jul 12, 2013
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 130 paragraphs 1, 2, 3, and 4; equation 2	paragraph 1: current: "...which represents each annotator's agreement with other annotators, compared to all possible agreement values. As before, a is the number of annotations per annotator, k is the number of categories, c is the current category, and i is the current annotator." corrected: "... which represents the annotator's agreement per review compared to all possible agreement values. As before, a is the number of annotations per review, k is the number of categories, c is the current category, and i is the current review." -------------------------------------------- paragraph 2: current: " ...moderating the output by the number of total annotations by each annotator. So for Annotator A, we would calculate this:" corrected: "...moderating the output by the number of total annotations for each review. So for Review 1, we would calculate this:" ------------------------------------- first line of equation after paragraph 2: current: "P(Annotator 1)=..." corrected: "P(Review 1) = ..." ------------------------------------------- paragraph 3: current: "Annotator row" corrected: "Review row" --------------------------------------- paragraph 4: Change Pi to have P with a subscript i; last sentence: change "number of annotators" to "number of reviews"	Amber Stubbs	Mar 01, 2013	Jul 12, 2013
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 129 paragraph 1, 2, and 3	paragraph 1: current: "In this table, the categories are across the top, the annotators are down the side, and the content of each cell represents how many times that annotator assigned that tag to a document." corrected: "In this table, the categories are across the top, the movie review documents are down the side, and the content of each cell represents how many times that an annotator assigned each category to each review." -------------------------------------- paragraph 2, last sentence: current: "In the following equation, A is the number of annotators, a is the number of annotations per annotator, k is the number of categories, and i represents the current annotator:" corrected: "In the following equation, A is the number of reviews, a is the number of annotations per review, k is the number of categories, and i represents the current table cell." --------------------------------------------------- paragraph 3, first and second sentences: current:"...will equal the sum of the values in its column divided by the number of annotators times the number of annotations each annotator created. The second part..." corrected: "... will equal the sum of the values in its column divided by the number of reviews (5) times the number of annotations per review (250). The second version..."	Amber Stubbs	Mar 01, 2013	Jul 12, 2013
Printed, PDF, ePub, Mobi,	Page 129-130 All tables for Fleiss's kappa	All instances of the table should replace the leftmost column's "Annotator A, B, C, D, E" with "Review 1, 2, 3, 4, 5"	Amber Stubbs	Mar 01, 2013	Jul 12, 2013
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 128 last paragraph	The description of the use of Fleiss's kappa was unclear, and the problem is incorrectly described. The paragraph should read: The table used to represent annotator values for Fleiss's Kappa, rather than having one axis per annotator, has one axis for the possible values an annotator could assign, and the other axis for each of the items being annotated. The contents of the cells show how many annotators assigned each category to each item. Note that Fleiss's Kappa does not assume that all items are annotated by the <i>same</i> annotators, but it does assume that all items are annotated the same number of times. Just so we can look at some bigger numbers, let's assume we redid our movie review annotation task as a crowdsourcing project (see Chapter 12). Instead of having 250 movie reviews annotated by 2 people, let's say that we had 5 movie reviews annotated as positive, neutral, or negative by 250 people each. These annotations would be represented like this:	Amber Stubbs	Mar 01, 2013	Jul 12, 2013
Printed	Page 62 Third paragraph	bigger sliding windows, we an define... -> bigger sliding windows, we can define... (a missed "c")	Michele Filannino	Feb 25, 2013	Jul 12, 2013
Printed, PDF	Page 30 Table before Figure 1-14	"Predicted Labeling" and "Gold Labeling" should be put together and centered. As they are now, each word seem to be referred to one column/row in particular. This lead to a bit of initial confusion. Note from the Author or Editor: yes, each of these labels should span the rows/columns that they are labeling	Michele Filannino	Feb 25, 2013	Jul 12, 2013
Printed	Page 29 Figure 1-12	In the printed, black and white version, the colors of "training" and "dev-test" are very similar. I suggest to use geometric patterns. Note from the Author or Editor: True; the PDF version has the image in color, where the differences are much clearer. Thank you for pointing this out.	Michele Filannino	Feb 25, 2013	Jul 12, 2013
Printed, PDF	Page 133, 134 equations at bottom of 133 and top of 134, first paragraph of 134	(I'm sure this error exists in all formats, but I can't verify page numbers on all of them. This error starts in Chapter 6:Calculating k in other contexts.) Two numbers got transposed in the equations for a(untagged), which proliferated through the rest of the calculations. The fixed numbers are below, with corrections surrounded by _s: a(untagged) = _.296_, b(untagged) = .148 --> untagged = _.044_ pr(e) = .057+.279+_.044_ = _.38_ k = (.556 - _.38_)/(1-_.38_) = _.176/.62_ = _.284_ The next paragraph should also say ".284" instead of ".312"	Amber Stubbs	Jan 31, 2013	Feb 22, 2013
Printed, PDF	Page 131 top of page, 4th equation; first paragraph of next section	(I'm sure this error exists in all formats, but I can't verify page numbers on all of them. This error starts in Chapter 6: Fleiss's Kappa and proliferates into Chapter 6: Interpreting Kappa Coefficients.) The last line of the Fleiss Kappa equation should be = .010, not .004 At the beginning of the next section, "Interpreting Kappa Coefficients", the .004 should also be replaced with .010. However, this doesn't change the discussion of the numbers, as the agreement is still quite low.	Amber Stubbs	Jan 31, 2013	Feb 22, 2013
Printed, PDF	Page 128,131,132 first paragraph, first and second pulled-out equations	(I'm sure this error exists in all formats, but I can't verify page numbers on all of them. This error starts in Chapter 6: Cohen's Kappa and proliferates into Chapter 6: Interpreting Kappa Coefficients.) The calculation for how many times A and B used the "positive" label is incorrect. It should be .34% instead of .425. This error propagates down the page. This corrected text follows (mistakes are surrounded by _s): "...A used the label ?positive? 85 times (54 + 28 + 3), or _.34_ percent of the time. Annotator B also used the ?positive? label 85 times (54 + 31), which is also _.34_. Multiplied together, _.34_ * _.34_ = _.116_, so A and B have a _.116_ chance of both randomly choosing ?positive? as a label." (the next paragraph is fine) "Adding those three chance agreement scores together gives us Pr(e) = _.116_ + .077 + .146 = _.339_ Putting Pr(a) and Pr(e) into the equation gives us: κ = (.576 - _.339_) / (1 - _.339_) = _.237/.661_ = _.359_" The incorrect number is repeated on pages 131 and 132, where each mention (3 in total) of .29 in the text should be replaced with .359. As this is still a relatively low agreement score, the discussion does not change.	Amber Stubbs	Jan 31, 2013	Feb 22, 2013
Printed	Page 30 Figure 1-14	Figure 1-14, 2nd row (Recall) should read: R = tp/(tp + fn)	O'Reilly Media	Oct 16, 2012	Oct 16, 2012