Chapter 1. Introduction to Apache Spark

This chapter covers

What Spark brings to the table
Spark components
Spark program flow
Spark ecosystem
Downloading and starting the spark-in-action virtual machine

Apache Spark is usually defined as a fast, general-purpose, distributed computing platform. Yes, it sounds a bit like marketing speak at first glance, but we could hardly come up with a more appropriate label to put on the Spark box.

Apache Spark really did bring a revolution to the big data space. Spark makes efficient use of memory and can execute equivalent jobs 10 to 100 times faster than Hadoop’s MapReduce. On top of that, Spark’s creators managed to abstract away the fact that you’re dealing with a cluster of machines, and instead ...

Get Spark in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Spark in Action by Petar Zecevic, Marko Bonaci

Chapter 1. Introduction to Apache Spark

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly