Python Data Science Essentials - Third Edition
by Alberto Boschetti, Luca Massaron, Pietro Marinelli, Matteo Malosetti
Introducing Apache Spark
Apache Spark is an evolution of Hadoop and has become very popular in the last few years. In contrast to Hadoop, and its Java and batch-focused design, Spark is able to produce iterative algorithms in a fast and easy way. Furthermore, it has a very rich suite of APIs for multiple programming languages, and natively supports many different types of data processing (machine learning, streaming, graph analysis, SQL, and so on).
Apache Spark is a cluster framework designed for the quick and general-purpose processing of big data. One of the improvements in speed results from the fact that the data, after every job, is kept in-memory and not stored on the filesystem (unless you want to do so) as would have happened with ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access