Book description
- Understand the advanced features of PySpark2 and SparkSQL
- Optimize your code
- Program SparkSQL with Python
- Use Spark Streaming and Spark MLlib with Python
- Perform graph analysis with GraphFrames
Table of contents
- Cover
- Front Matter
- 1. The Era of Big Data, Hadoop, and Other Big Data Processing Frameworks
- 2. Installation
- 3. Introduction to Python and NumPy
- 4. Spark Architecture and the Resilient Distributed Dataset
- 5. The Power of Pairs: Paired RDDs
- 6. I/O in PySpark
- 7. Optimizing PySpark and PySpark Streaming
- 8. PySparkSQL
- 9. PySpark MLlib and Linear Regression
- Back Matter
Product information
- Title: PySpark Recipes: A Problem-Solution Approach with PySpark2
- Author(s):
- Release date: December 2017
- Publisher(s): Apress
- ISBN: 9781484231418
You might also like
book
40 Algorithms Every Programmer Should Know
Learn algorithms for solving classic computer science problems with this concise guide covering everything from fundamental …
book
Head First Design Patterns, 2nd Edition
You know you don’t want to reinvent the wheel, so you look to design patterns—the lessons …
book
Fundamentals of Software Architecture
Salary surveys worldwide regularly place software architect in the top 10 best jobs, yet no real …
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …