Errata

Errata for Advanced Analytics with Spark

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
PDF	Page ch 9 overall	API from Google Finance is not valid now, so every example cannot be tested because I cannot get the data.	Anonymous	Jan 16, 2018
PDF	Page 8 Last paragraph, second last line	"preserve the book as an useful resource" should instead be "preserve the book as a useful resource"	Anonymous	May 12, 2017
Printed	Page 17 last paragraph	Typo linakge instead of linkage val rawblocks = sc.textFile("linakge") val rawblocks = sc.textFile("linkage") instead of val rawblocks = sc.textFile("linkage") val rawblocks = sc.textFile("linkage")	Anonymous	Feb 20, 2019
PDF	Page 29 second code bundle	parsed. groupBy("is_match"). count(). orderBy($"count".desc) show() --> parsed. groupBy("is_match"). count(). orderBy($"count".desc). show() '.' is omitted.	Anonymous	Oct 09, 2017
PDF	Page 31 2nd last paragraph	"option of treating the any DataFrame that we create" the word "the" should be removed.	Anonymous	May 15, 2017
PDF	Page 34 2nd paragraph	in "that would be valid inside of a WHERE clase in Spark SQL" "clase" should be "clause"	Anonymous	May 15, 2017
PDF	Page 35 first tip	"isn't comprised of" should instead be "doesn't comprise"	Anonymous	May 13, 2017
PDF	Page 82 right above the code sample	"Here, MulticlassMetrics is perfectly usage with a DataFrame containing predictions." should needs to be fixed	Anonymous	May 16, 2017
PDF	Page 86 first equation	One of the p variables is missing a subscript i	Anonymous	May 17, 2017
PDF	Page 137 Overall CH 7	I read both of 1st and 2nd edition of this book. In 2nd edition, sample data is changed (from 2014 version to 2016 version). but some test result still remain in older version's. For example, in page 144, "there are more than 13,000 different major topics in our data set".... but, 13000 is older result of 1st edition. Of course, 14548 is also more than 13000, but there are more such mistakes. 3rd paragraph of page 148, "which only has 13,000 vertices in the graph" -> "which only has 14,500 vertices in the graph" last paragraph of page 153, 13034 and 12065 -> 14548 and 13721 I may not find rest of such mistake.	Anonymous	Oct 08, 2017
PDF	Page 143 3rd code piece	def majorTopics(record: String)={...} majorTopics(elem) elem is not a type of String, so I think elem.toString() or rawXml is right here.	Anonymous	Oct 10, 2017
PDF	Page 151 1st paragraph, 2nd line	contains only 4 vertices -> contains only 5 vertices	Anonymous	Oct 11, 2017
PDF	Page 151 code pieces in this page	val topicComponentDF = topicGraph.vertices.innerJoin( connectedComponentGraph.vertices) { (topicId, name, componentId) => (name, componentId.toLong) }.toDF("topic", "cid") code does not work properly. In result dataframe, values of cid are located in the topic column.	Anonymous	Oct 11, 2017
PDF	Page 151 2nd paragraph	"Let’s take a look at the topic names for the largest connected component that wasn’t a part of the giant component:" But your example is not the second largest connected component, but third largest.	Anonymous	Oct 11, 2017
PDF	Page 151 code pieces in this page	by the code val topicComponentDF = topicGraph.vertices.innerJoin( connectedComponentGraph.vertices) { (topicId, name, componentId) => (name, componentId.toLong) }.toDF("topic", "cid") generate DF with schema topic: long cid: struct _1 : string _2 : long so, following query must be changed such topicComponentDF.where("cid._2 = -2062883918534425492").show(false) then result : +--------------------+-----------------------------------------------+ \|topic \|cid \| +--------------------+-----------------------------------------------+ \|-1870678893086276394\|[Serotyping,-2062883918534425492] \| \|-1233269114313988317\|[Campylobacter coli,-2062883918534425492] \| \|-2062883918534425492\|[Campylobacter jejuni,-2062883918534425492] \| \|4763791955467795057 \|[Campylobacter Infections,-2062883918534425492]\| +--------------------+-----------------------------------------------+	Anonymous	Nov 05, 2017
PDF	Page 158 last subgraph (to the next page)	The mean degree for the original graph was about 43, and the mean degree for the filtered graph has fallen a bit, to about 28. More interesting, however, is the precipitous drop in the size of the largest degree vertex, which has fallen from 3,753 in the original graph to 1,603 in the filtered graph. If we look at the association between concept and degree in the filtered graph, we see this: the numbers are values of example of 1st edition. For the example of 2nd edition, 43 -> 31 28 -> 20 3753 -> 2596 1603 -> 863	Anonymous	Oct 11, 2017
PDF	Page 170 bottom code block & link in paragraph above	link to taxi trips data set should be https://storage.googleapis.com/aas-data-sets/trip_data_1.csv.zip	Marie Beaugureau	Oct 16, 2017
PDF	Page 197 3rd paragraph, last line	"a estimate" should instead be "an estimate"	Anonymous	May 19, 2017
PDF	Page 200 middle	"We can represent out dates as LocalDate objects" "out" should be "our"	Anonymous	May 21, 2017