O'Reilly logo

Spark for Data Science by Bikramaditya Singhal, Srinivas Duvvuri

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The Spark stack

Spark is a general-purpose cluster computing system that empowers other higher-level components to leverage its core engine. It is interoperable with Apache Hadoop, in the sense that it can read and write data from/to HDFS and can also integrate with other storage systems that are supported by the Hadoop API.

While it allows building other higher-level applications on top of it, it already has a few components built on top that are tightly integrated with its core engine to take advantage of the future enhancements at the core. These applications come bundled with Spark to cover the broader sets of requirements in the industry. Most of the real-world applications need to be integrated across projects to solve specific business problems ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required