Overview
In this 3-hour course, you’ll explore big data processing with Apache Spark, focusing on real-time data stream consumption and machine learning extensions. Learn to use Spark’s powerful APIs for data processing, Spark Streaming, and how to integrate with AWS to create efficient big data workflows.
What I will be able to do after this course
- Write Python programs that interact with Spark for data processing.
- Implement real-time data stream consumption using Apache Spark Streaming.
- Recognize and apply common operations in Spark to process data streams.
- Integrate Spark streaming with AWS for stream consumption.
- Create a collaborative filtering model using Python and the movielens dataset.
- Apply processed data streams to Spark’s machine learning APIs.
Course Instructor(s)
John Bura, an experienced game developer and educator, has been programming since 1997 and producing games for various platforms. He has contributed to over 40 commercial games and teaches game development and programming through Mammoth Interactive.
Who is it for?
This course is for software engineers, architects, and IT professionals interested in distributed systems and big data analytics. Some prior experience with Python is recommended but no prior knowledge of Spark is necessary.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Watch now
Unlock full access