Overview
"Learning PySpark" guides you through mastering the integration of Python with Apache Spark to build scalable and efficient data applications. You'll delve into Spark 2.0's architecture, efficiently process data, and explore PySpark's capabilities ranging from machine learning to structured streaming. By the end, you'll be equipped to craft and deploy robust data pipelines and applications.
What this Book will help me do
- Master the Spark 2.0 architecture and its Python integration with PySpark.
- Leverage PySpark DataFrames and RDDs for effective data manipulation and analysis.
- Develop scalable machine learning models using PySpark's ML and MLlib libraries.
- Understand advanced PySpark features such as GraphFrames for graph processing and TensorFrames for deep learning models.
- Gain expertise in deploying PySpark applications locally and on the cloud for production-ready solutions.
Author(s)
Authors None Drabas and None Lee bring extensive experience in data engineering and Python programming. They combine a practical, example-driven approach with deep insights into Apache Spark's ecosystem. Their expertise and clarity in writing make this book accessible for individuals aiming to excel in big data technologies with Python.
Who is it for?
This book is best suited for Python developers who want to integrate Apache Spark 2.0 into their workflow to process large-scale data. Ideal readers will have foundational knowledge of Python and seek to build scalable data-intensive applications using Spark, regardless of prior experience with Spark itself.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access