Book description
Spark for Data Professionals introduces and solidifies the concepts behind Spark 2.x, teaching working developers, architects, and data professionals exactly how to build practical Spark solutions. Jeffrey Aven covers all aspects of Spark development, including basic programming to SparkSQL, SparkR, Spark Streaming, Messaging, NoSQL and Hadoop integration. Each chapter presents practical exercises deploying Spark to your local or cloud environment, plus programming exercises for building real applications. Unlike other Spark guides, Spark for Data Professionals explains crucial concepts step-by-step, assuming no extensive background as an open source developer. It provides a complete foundation for quickly progressing to more advanced data science and machine learning topics. This guide will help you:
- Understand Spark basics that will make you a better programmer and cluster “citizen”
- Master Spark programming techniques that maximize your productivity
- Choose the right approach for each problem
- Make the most of built-in platform constructs, including broadcast variables, accumulators, effective partitioning, caching, and checkpointing
- Leverage powerful tools for managing streaming, structured, semi-structured, and unstructured data
Table of contents
- Cover Page
- Title Page
- Copyright Page
- Contents at a Glance
- Table of Contents
- About This E-Book
- Preface
- Introduction
-
I: Spark Foundations
- 1 Introducing Big Data, Hadoop, and Spark
- 2 Deploying Spark
- 3 Understanding the Spark Cluster Architecture
- 4 Learning Spark Programming Basics
-
II: Beyond the Basics
- 5 Advanced Programming Using the Spark Core API
- 6 SQL and NoSQL Programming with Spark
- 7 Stream Processing and Messaging Using Spark
- 8 Introduction to Data Science and Machine Learning Using Spark
- Index
- Code Snippets
Product information
- Title: Data Analytics with Spark Using Python, First edition
- Author(s):
- Release date: June 2018
- Publisher(s): Addison-Wesley Professional
- ISBN: 9780134844855
You might also like
book
Practical Data Science with Python
Learn to effectively manage data and execute data science projects from start to finish using Python …
book
Foundational Python for Data Science
Data science and machine learning two of the worlds hottest fields are attracting talent from a …
book
Mastering Large Datasets with Python
Modern data science solutions need to be clean, easy to read, and scalable. In Mastering Large …
video
Apache Spark 3 for Data Engineering and Analytics with Python
Apache Spark 3 is an open-source distributed engine for querying and processing data. This course will …