Book description
- Understand the advanced features of PySpark2 and SparkSQL
- Optimize your code
- Program SparkSQL with Python
- Use Spark Streaming and Spark MLlib with Python
- Perform graph analysis with GraphFrames
Table of contents
- Cover
- Front Matter
- 1. The Era of Big Data, Hadoop, and Other Big Data Processing Frameworks
- 2. Installation
- 3. Introduction to Python and NumPy
- 4. Spark Architecture and the Resilient Distributed Dataset
- 5. The Power of Pairs: Paired RDDs
- 6. I/O in PySpark
- 7. Optimizing PySpark and PySpark Streaming
- 8. PySparkSQL
- 9. PySpark MLlib and Linear Regression
- Back Matter
Product information
- Title: PySpark Recipes: A Problem-Solution Approach with PySpark2
- Author(s):
- Release date: December 2017
- Publisher(s): Apress
- ISBN: 9781484231418
You might also like
book
PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes
Carry out data analysis with PySpark SQL, graphframes, and graph data processing using a problem-solution approach. …
book
Scaling Machine Learning with Spark
Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, …
video
Apache Spark Streaming with Python and PySpark
Spark Streaming is becoming incredibly popular, and with good reason. According to IBM, 90% of the …
book
Interactive Spark using PySpark
Apache Spark is an in-memory framework that allows data scientists to explore and interact with big …