Learning Spark

Errata for Learning Spark

Submit your own errata for this product.


The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update



Version Location Description Submitted By Date Submitted Date Corrected
PDF
Page vii
2nd parapgraph

duplicated wording READ: "You’ll learn how to learn how to download..." SHOULD READ: "You’ll learn how to download..."

Note from the Author or Editor:
Fixed in fe6dc3e1dd493a83464e115a4309ab806cf240cb

Ricardo Almeida  Oct 08, 2014 
PDF
Page 9, 10
P9 - Downloading Spark: P1; P10 - 1st paragraph after the notes

Page 9 has the following text: This will download a compressed tar file, or “tarball,” called spark-1.1.0-bin-hadoop1.tgz . On page 10, a different tarball is referenced: cd ~ tar -xf spark-1.1.0-bin-hadoop2.tgz cd spark-1.1.0-bin-hadoop2

Kevin D'Elia  Oct 20, 2014 
PDF
Page 20
Example 2-13. maven build example

Is there any reason why Akka repo is needed to build the mini project? It seems like all dependencies of spark-core_2.10:1.1.0 are already available in the maven central.

Note from the Author or Editor:
I have removed the aka repo from our mini example.

Uladzimir Makaranka  Sep 21, 2014 
PDF
Page 20
Example 2-13

<artifactId>learning-spark-mini-example/artifactId> is missing closing <

Note from the Author or Editor:
Fixed in b99b12fcd3022c298d30f3fcd2b1d88fd7eab57c

Kevin D'Elia  Oct 19, 2014 
PDF
Page 21
Example 2-15

Maven command line executable is called 'mvn'. Please replace "maven clean && maven compile && maven package" with "mvn clean && mvn compile && mvn package". Also the maven build script (Example 2-13) doesn't compile scala code (i.e. c.o.l.mini.scala), please replace "$SPARK_HOME/bin/spark-submit --class com.oreilly.learningsparkexamples.mini.scala.WordCount \ ./target/learning-spark-mini-example-0.0.1.jar ./README.md ./wordcounts" with "$SPARK_HOME/bin/spark-submit --class com.oreilly.learningsparkexamples.mini.java.WordCount \ ./target/learning-spark-mini-example-0.0.1.jar ./README.md ./wordcounts"

Note from the Author or Editor:
Fixed

Uladzimir Makaranka  Sep 21, 2014 
PDF
Page 33
Figure 3-3

READ: RDD2.subtract(RDD2) {panda,tea} SHOULD READ: RDD1.subtract(RDD2) {panda, tea}

Note from the Author or Editor:
I've fixed this is the latest build for author provided images, but if O'Reilly has already started remaking the images you may need to redo the Figure 3-3 bottom right as the submitter has suggested.

Tatsuo Kawasaki  Aug 18, 2014 
PDF
Page 37
Figure 3-2. Map and filter on an RDD

FilteredRDD {1,4,9,16} should be FilteredRDD {2,3,4}

Note from the Author or Editor:
Thanks for pointing this out, I've gone ahead and fixed this and it should be in our next build.

Tang Yong  Aug 18, 2014 
PDF
Page 37
Example 3-24. Scala squaring the values in an RDD

println(result.collect()) should be result.collect().foreach{x=>println(x)}

Note from the Author or Editor:
Fixed in the latest build.

Tang Yong  Aug 18, 2014 
ePub
Page 58
Example 4-12

Example 4-12 (Python) is not equivalent to the others: the sum of numbers must be divided by the count to yield the average. Having the Python example implement the same behavior as the Scala and Java examples will aid the reader. My version of the example is: nums = sc.parallelize([(1,2),(1,4),(3,6),(4,6),(4,8),(4,13)]) sumCount = nums.combineByKey((lambda x : (x , 1)), (lambda x, y : (x[0] + y, x[1] + 1)), (lambda x ,y : (x [0] + y[0], x[1] + y[1]))) print sumCount.map(lambda (k,v): (k, v[0]/float(v[1]))).collect()

Note from the Author or Editor:
no action, fixed in book already.

Andres Moreno  Dec 02, 2014 
PDF
Page 66
JSON

It is mentioned that liftweb-json is used for JSON-parsing, however Play JSON is used for parsing and then liftweb-json for JSON output. This is a bit confusing.

Note from the Author or Editor:
I've fixed this in the latest push.

Anonymous  Aug 05, 2014 
PDF
Page 70
United States

feildnames

Note from the Author or Editor:
Fixed in the latest build (typo)

Anonymous  Aug 17, 2014 
PDF
Page 70
first paragraph

"In Python if an value isn’t present None is used and if the value is present the regular value" should be "In Python if a value isn’t present None is used and if the value is present the regular value"

Note from the Author or Editor:
Fixed in atlass

Mark Needham  Nov 30, 2014 
PDF
Page 85
Example 5-13/5-14

Minor issue; there should be a import Java.io.StringReader statement in your CSV loading examples in Scala (and presumably Java)

Note from the Author or Editor:
I fixed in holden@hmbp2:~/repos/1230000000573$ git log commit a9f9f34a3b8513885325f47c1101e657cb5faa89

Timothy Elser  Oct 07, 2014 
ePub
Page 112
3rd

Text reads: “Spark has many levels of persistence to chose from based on what our goals are. ” should read: “Spark has many levels of persistence to choose from based on what our goals are. ”

Note from the Author or Editor:
fixed in latest version of atlass

Bruce Sanderson  Nov 15, 2014 
ePub
Page 126
1st paragraph

The text "...to how we used fold and map compute the entire RDD average” should read: “ ...to how we used fold and map to compute the entire RDD average”

Note from the Author or Editor:
Fixed in atlass

Bruce Sanderson  Nov 18, 2014