O'Reilly logo

Spark for Data Science by Bikramaditya Singhal, Srinivas Duvvuri

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. The Spark Programming Model

Large-scale data processing using thousands of nodes with built-in fault tolerance has become widespread due to the availability of open source frameworks, with Hadoop being a popular choice. These frameworks are quite successful in executing specific tasks such as Extract, Transform, and Load (ETL) and storage applications that deal with web-scale data. However, developers were left with a myriad of tools to work with, along with the well-established Hadoop ecosystem. There was a need for a single, general-purpose development platform that caters to batch, streaming, interactive, and iterative requirements. This was the motivation behind Spark.

The previous chapter outlined the big data analytics challenges ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required