on-demand course

Apache Spark Streaming with Python and PySpark

with James Lee

September 2018

Beginner to intermediate

3h 24m

English

Packt Publishing

Watch now

Unlock full access

Includes

Earns Badge

Course outline

The Course Overview
1m 49s
How to Take this Course and How to Get Support
45s
Introduction to Streaming
7m 29s
Pyspark Setup Tutorial
13m 59s
Example Twitter Application
20m 23s
What are Discretized Streams?
2m 23s
How to Create Discretized Streams
6m 11s
Transformations on DStreams
7m 58s
Transformation Operation
7m 29s
Window Operations
1m 41s
Window
4m 22s
countByWindow
3m 40s
reduceByKeyAndWindow
4m 52s
countByValueAndWindow
4m 0s
Output Operations on DStreams
3m 33s
forEachRDD
5m 59s
SQL Operations
5m 42s
Reviewing the Basics
5m 34s
Join Operations
5m 31s
Stateful Transformations
4m 44s
Checkpointing
5m 46s
Accumulators
3m 27s
Fault Tolerance
11m 48s
Performance Tuning
8m 39s
PySpark Streaming with Apache Kafka
11m 22s
PySpark Streaming with Amazon Kinesis
13m 13s
Introduction to Structured Streaming
4m 41s
Operations on Streaming Dataframes and DataSets
9m 5s
Window Operations
8m 48s
Handling Late Data and Watermarking
6m 27s
Final Video
2m 42s

Overview

In this 3 hr course, you will learn how to use Apache Spark Streaming with Python to process and analyze real-time big data streams. This course provides a comprehensive foundation for creating, optimizing, and deploying streaming data pipelines using PySpark.

What I will be able to do after this course

Master developing Spark Streaming applications using PySpark.
Gain expertise in processing live data from sources like Twitter.
Learn techniques to optimize Spark jobs for better performance.
Understand Spark SQL and its applications in structured data processing.
Integrate Spark Streaming with tools like Apache Kafka.

Course Instructor(s)

James Lee is a seasoned developer and data engineer with years of experience working with big data solutions. Having taught thousands of students, James specializes in breaking down complex concepts into easily digestible lessons, ensuring learners gain practical skills they can apply right away. His teaching combines a deep understanding of technology with a passion for empowering students to succeed.

Who is it for?

This course is ideal for Python developers aiming to expand into streaming data processing, big data professionals looking to add Spark to their toolkit, and managers or engineers focused on enhancing their team's data capabilities. Learners should have some experience with Python and basic data concepts to benefit from this course.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Apache Spark with Python - Big Data with PySpark and Spark

Publisher Resources

ISBN: 9781789808223Supplemental Content

Apache Spark Streaming with Python and PySpark

with James Lee

Chapter 1 : Getting started with Apache Spark Streaming

Chapter 2 : Pyspark Basics

Chapter 3 : Advanced Spark Concepts

Chapter 4 : PySpark Streaming at Scale

Chapter 5 : Structured Streaming

Chapter 6 : Course Conclusion

Overview

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Apache Spark with Python - Big Data with PySpark and Spark

Spark Programming in Python for Beginners with Apache Spark 3

Fundamentals of Apache Flink

Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library

Publisher Resources

Chapter 1 : Getting started with Apache Spark Streaming

Chapter 2 : Pyspark Basics

Chapter 3 : Advanced Spark Concepts

Chapter 4 : PySpark Streaming at Scale

Chapter 5 : Structured Streaming

Chapter 6 : Course Conclusion

Overview

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Apache Spark with Python - Big Data with PySpark and Spark

Spark Programming in Python for Beginners with Apache Spark 3

Fundamentals of Apache Flink

Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.