Introduction to Spark Distributed Processing
By the end of this chapter, you will be able to:
- Write Python programs that execute parallel operations inside a Spark cluster
- Create and transform resilient distributed datasets
- Write standalone Python programs to interact with Spark
- Build DataFrames and perform SQL queries
In this lesson, you will be interacting with Spark using Python.
Apache Spark is a cluster computing framework that provides a collection of APIs. These APIs serve the purpose of performing general-purpose computation in clustered systems.
We can illustrate how Spark can be used in the real world with the example of a content provider that delivers movies, documentaries, and TV shows across ...