Skip to Content
Big Data Processing with Apache Spark
on-demand course

Big Data Processing with Apache Spark

with Vivek Kale, John Bura
January 2019
Beginner to intermediate
3h 30m
English
Packt Publishing
Closed Captioning available in English

Overview

In this 3-hour course, you’ll explore big data processing with Apache Spark, focusing on real-time data stream consumption and machine learning extensions. Learn to use Spark’s powerful APIs for data processing, Spark Streaming, and how to integrate with AWS to create efficient big data workflows.

What I will be able to do after this course

  • Write Python programs that interact with Spark for data processing.
  • Implement real-time data stream consumption using Apache Spark Streaming.
  • Recognize and apply common operations in Spark to process data streams.
  • Integrate Spark streaming with AWS for stream consumption.
  • Create a collaborative filtering model using Python and the movielens dataset.
  • Apply processed data streams to Spark’s machine learning APIs.

Course Instructor(s)

John Bura, an experienced game developer and educator, has been programming since 1997 and producing games for various platforms. He has contributed to over 40 commercial games and teaches game development and programming through Mammoth Interactive.

Who is it for?

This course is for software engineers, architects, and IT professionals interested in distributed systems and big data analytics. Some prior experience with Python is recommended but no prior knowledge of Spark is necessary.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Apache Spark with Scala – Hands-On with Big Data!

Apache Spark with Scala – Hands-On with Big Data!

Frank Kane

Publisher Resources

ISBN: 9781789953688