Processing data with Apache Spark
In this section, we will implement the examples from Chapter 3, Processing – MapReduce and Beyond, using the Scala API. We will consider both the batch and real-time processing scenarios. We will show you how Spark Streaming can be used to compute statistics on the live Twitter stream.
Building and running the examples
Scala source code for the examples can be found at https://github.com/learninghadoop2/book-examples/tree/master/ch5. We will be using sbt
to build, manage, and execute code.
The build.sbt
file controls the codebase metadata and software dependencies; these include the version of the Scala interpreter that Spark links to, a link to the Akka package repository used to resolve implicit dependencies, as ...
Get Learning Hadoop 2 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.