The Spark stack

Spark is a general-purpose cluster computing system that empowers other higher-level components to leverage its core engine. It is interoperable with Apache Hadoop, in the sense that it can read and write data from/to HDFS and can also integrate with other storage systems that are supported by the Hadoop API.

While it allows building other higher-level applications on top of it, it already has a few components built on top that are tightly integrated with its core engine to take advantage of the future enhancements at the core. These applications come bundled with Spark to cover the broader sets of requirements in the industry. Most of the real-world applications need to be integrated across projects to solve specific business problems ...

Get Spark for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.