Book description
- Understand the advanced features of PySpark2 and SparkSQL
- Optimize your code
- Program SparkSQL with Python
- Use Spark Streaming and Spark MLlib with Python
- Perform graph analysis with GraphFrames
Table of contents
- Cover
- Front Matter
- 1. The Era of Big Data, Hadoop, and Other Big Data Processing Frameworks
- 2. Installation
- 3. Introduction to Python and NumPy
- 4. Spark Architecture and the Resilient Distributed Dataset
- 5. The Power of Pairs: Paired RDDs
- 6. I/O in PySpark
- 7. Optimizing PySpark and PySpark Streaming
- 8. PySparkSQL
- 9. PySpark MLlib and Linear Regression
- Back Matter
Product information
- Title: PySpark Recipes: A Problem-Solution Approach with PySpark2
- Author(s):
- Release date: December 2017
- Publisher(s): Apress
- ISBN: 9781484231418
You might also like
video
Processing Covid-19 Data with Apache Spark
How to use JHU data and Apache Spark to predict Covid-19 outbreaks.
book
Scala and Spark for Big Data Analytics
Harness the power of Scala to program Spark and analyze tonnes of data in the blink …
book
Learn PySpark: Build Python-based Machine Learning and Deep Learning Models
Leverage machine and deep learning models to build applications on real-time data using PySpark. This book …
video
Strata + Hadoop World New York 2015: Video Compilation
The future belongs to those who know how to use data Whether you want to build …