© Zubair Nabi 2016

Zubair Nabi, Pro Spark Streaming, 10.1007/978-1-4842-1479-4_2

2. Introduction to Spark

Zubair Nabi

(1)Lahore, Pakistan

There are two major products that came out of Berkeley: LSD and UNIX. We don’t believe this to be a coincidence.

—Jeremy S. Anderson

Like LSD and Unix, Spark was originally conceived in 20091 at Berkeley,2 in the same Algorithms, Machines, and People (AMP) Lab that gave the world RAID, Mesos, RISC, and several Hadoop enhancements. It was initially pitched to the academic community as a distributed framework built from the ground up atop the Mesos cross-platform scheduler (then called Nexus). Spark can be thought of as an in-memory variant of Hadoop, with the following key differences:

  • Directed acyclic graph

Get Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.