O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Manipulating Your RDD

The last few chapters have been the necessary groundwork to get Spark working. Now that you know how to load and save data in different ways, it's time for the big payoff, that is, manipulating data. The API you'll use to manipulate your RDD is similar among languages but not identical. Unlike the previous chapters, each language is covered in its own section here; should you wish, you could only read the one pertaining to the language you are interested in using.

Manipulating your RDD in Scala and Java

RDD is the primary low-level abstraction in Spark. As we discussed in the last chapter, the main programming APIs will be Datasets/DataFrames. However, underneath it all, the data will be represented as RDDs. So, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required