Chapter 2. Spark fundamentals

This chapter covers

  • Exploring the spark-in-action VM
  • Managing multiple Spark versions
  • Getting to know Spark’s command line interface (spark-shell)
  • Playing with simple examples in spark-shell
  • Exploring RDD actions and transformations and double functions

It’s finally time to get down to business. In this chapter, you’ll start using the VM we prepared for you and write your first Spark programs. All you need is a laptop or a desktop machine with a usable internet connection and the prerequisites described in chapter 1.

To avoid overwhelming you this early in the book with various options for running Spark, for now you’ll be using the so-called Spark standalone local cluster. Standalone means Spark is using ...

Get Spark in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.