This chapter covers the following recipes:
Recipe 4-1. Create an RDD
Recipe 4-2. Convert temperature data
Recipe 4-3. Perform basic data manipulation
Recipe 4-4. Run set operations
Recipe 4-5. Calculate summary statistics
Recipe 4-6. Start PySpark shell on Standalone cluster manager ...
4. Spark Architecture and the Resilient Distributed Dataset
Raju Kumar Mishra1
(1)
Bangalore, Karnataka, India
You learned Python in the preceding chapter. Now it is time to learn PySpark and utilize the power of a distributed system to solve problems related to big data. We generally distribute large amounts of data on a cluster and perform processing on that distributed data.
Get PySpark Recipes: A Problem-Solution Approach with PySpark2 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.