July 2018
Intermediate to advanced
334 pages
8h 20m
English
We will now run count on our new purgedRDD:
scala> purgedRDD.countres12: Long = 684
So, in the preceding code, we invoked the count method on purgedRDD. Spark returned a value of 684. Apparently, 16 rows contained ? characters. After all, many datasets like this one need a preprocessing step or two. For now, we will proceed with the next steps in data analysis, secure in the knowledge that Spark will probably not report an error, especially at the point where we want a new two-column DataFrame containing a consolidated feature vector.
In the next section, we are going to get rid of header.
Read now
Unlock full access