Chapter 3. Writing Spark applications
This chapter covers
- Generating a new Spark project in Eclipse
- Loading a sample dataset from the GitHub archive
- Writing an application that analyzes GitHub logs
- Working with DataFrames in Spark
- Submitting your application to be executed
In this chapter, you’ll learn to write Spark applications. Most Spark programmers use an integrated development environment (IDE), such as IntelliJ or Eclipse. There are readily available resources online that describe how to use IntelliJ IDEA with Spark, whereas Eclipse resources are still hard to come by. That is why, in this chapter, you’ll learn how to use Eclipse for writing Spark programs. Nevertheless, if you choose to stick to IntelliJ, you’ll still be able ...