on-demand course

Apache Spark with Scala - Learn Spark from a Big Data Guru

with James Lee

April 2018

Beginner to intermediate

3h 16m

English

Packt Publishing

Closed Captioning available in English

Watch now

Unlock full access

Includes

Badge

Course outline

Course Overview
4m 13s
Introduction to Spark
2m 27s
Install Java and Git
4m 21s
Set up Spark project with IntelliJ IDEA
7m 2s
Run our first Apache Spark job
2m 57s
Trouble Shooting: Run our first Apache Spark job
48s
RDD Basics in Apache Spark
2m 45s
Create RDDs
2m 33s
Map and Filter Transformation in Apache Spark
8m 44s
Solution to Airports by Latitude Problem
1m 34s
FlatMap Transformation in Apache Spark
4m 53s
Set Operation in Apache Spark
8m 1s
Solution for the Same Hosts Problem
1m 37s
Actions in Apache Spark
8m 7s
Solution to Sum of Numbers Problem
1m 47s
Important Aspects about RDD
1m 37s
Summary of RDD Operations in Apache Spark
2m 26s
Caching and Persistence in Apache Spark
5m 15s
Spark Architecture
3m 1s
Spark Components
5m 26s
Introduction to Pair RDD in Spark
1m 38s
Create Pair RDDs in Spark
3m 45s
Filter and MapValue Transformations on Pair RDD
4m 57s
Reduce By Key Aggregation in Apache Spark
5m 19s
Sample solution for the Average House problem
3m 20s
GroupBy Key Transformation in Spark
4m 50s
SortBy Key Transformation in Spark
2m 38s
Sample Solution for the Sorted Word Count Problem
2m 9s
Data Partitioning in Apache Spark
4m 18s
Join Operations in Spark
5m 2s
Accumulators
3m 50s
Solution to StackOverflow Survey Follow-up Problem
1m 0s
Broadcast Variables
6m 44s
Introduction to Apache Spark SQL
3m 54s
Spark SQL in Action
13m 28s
Spark SQL practice: House Price Problem
1m 44s
Spark SQL Joins
6m 33s
Strongly Typed Dataset
7m 4s
Use Dataset or RDD
3m 2s
Dataset and RDD Conversion
2m 32s
Performance Tuning of Spark SQL
2m 50s
Introduction to Running Spark in a Cluster
4m 14s
Package Spark Application and Use spark-submit
8m 14s
Run Spark Application on Amazon EMR (Elastic MapReduce) cluster
13m 37s

Overview

In this 3 hr course, you'll master the fundamentals of Apache Spark and Scala through hands-on big data examples. Gain practical skills in big data processing by learning to build and optimize Spark applications using Scala.

What I will be able to do after this course

Understand Apache Spark architecture and its processing model.
Learn to develop applications using Spark RDDs and Spark SQL.
Master techniques to optimize Spark jobs through caching and partitioning.
Develop and scale Apache Spark applications on a Hadoop Yarn cluster.
Analyze datasets using Spark SQL, DataFrames, and Datasets efficiently.

Course Instructor(s)

James Lee is an experienced big data engineer and educator with expertise in Apache Spark and Scala. With years of professional experience and a knack for clear and concise teaching, he breaks down complex topics into manageable lessons.

Who is it for?

This course is designed for software developers and data scientists who have basic programming experience and wish to deepen their understanding of big data processing with Apache Spark. Ideal for career growth in big data development and engineering.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Apache Spark with Java - Learn Spark from a Big Data Guru

Publisher Resources

ISBN: 9781789134537

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Apache Spark with Scala - Learn Spark from a Big Data Guru

with James Lee

Chapter 1 : Get Started with Apache Spark

Chapter 2 : RDD

Chapter 3 : Spark Architecture and Components

Chapter 4 : Pair RDD in Apache Spark

Chapter 5 : Advanced Spark Topic

Chapter 6 : Apache Spark SQL

Chapter 7 : Running Spark in a Cluster