Chapter 3. Writing Spark applications

This chapter covers

  • Generating a new Spark project in Eclipse
  • Loading a sample dataset from the GitHub archive
  • Writing an application that analyzes GitHub logs
  • Working with DataFrames in Spark
  • Submitting your application to be executed

In this chapter, you’ll learn to write Spark applications. Most Spark programmers use an integrated development environment (IDE), such as IntelliJ or Eclipse. There are readily available resources online that describe how to use IntelliJ IDEA with Spark, whereas Eclipse resources are still hard to come by. That is why, in this chapter, you’ll learn how to use Eclipse for writing Spark programs. Nevertheless, if you choose to stick to IntelliJ, you’ll still be able ...

Get Spark in Action now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.