O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Apache Spark with Scala ; Get to grips with the fundamentals of Apache Spark for real-time Big Data processing. Understand the fundamentals of Scala and the Apache Spark ecosystem as well as handle large streams of data with Spark Streaming

Video Description

Learn Apache Spark and Scala by 12+ hands-on examples of analyzing big data

About This Video

  • Apache Spark gives us an unlimited ability to build cutting-edge applications. It is also one of the most compelling technologies of the last decade in terms of its disruption to the big data world.
  • Spark provides in-memory cluster computing which greatly boosts the speed of iterative algorithms and interactive data mining tasks. Apache Spark is the next-generation processing engine for big data.
  • Tons of companies are adopting Apache Spark to extract meaning from massive data sets, today you have access to that same big data technology right on your desktop.Apache Spark is becoming a must tool for big data engineers and data scientists.

In Detail

This course covers all the fundamentals of Apache Spark with Scala and teaches you everything you need to know about developing Spark applications with Scala. At the end of this course, you will gain in-depth knowledge about Apache Spark and general big data analysis and manipulations skills to help your company to adapt Apache Spark for building a big data processing pipeline and data analytics applications. This course covers 10+ hands-on big data examples. You will learn valuable knowledge about how to frame data analysis problems as Spark problems. Together we will learn examples such as aggregating NASA Apache web logs from different sources; we will explore the price trend by looking at the real estate data in California; we will write Spark applications to find out the median salary of developers in different countries through the Stack Overflow survey data; we will develop a system to analyze how maker spaces are distributed across different regions in the United Kingdom, and much, much more. This course is taught in Scala. Scala is the next generation programming language for functional programming that is growing in popularity and it is one of the most widely used languages in the industry to write Spark programs. Let's learn how to write Spark programs with Scala to model big data problems today!

Table of Contents

  1. Chapter 1 : Get Started with Apache Spark
    1. Course Overview 00:04:13
    2. Introduction to Spark 00:02:27
    3. Install Java and Git 00:04:21
    4. Set up Spark project with IntelliJ IDEA 00:07:02
    5. Run our first Apache Spark job 00:02:57
    6. Trouble Shooting: Run our first Apache Spark job 00:00:48
  2. Chapter 2 : RDD
    1. RDD Basics in Apache Spark 00:02:45
    2. Create RDDs 00:02:33
    3. Map and Filter Transformation in Apache Spark 00:08:44
    4. Solution to Airports by Latitude Problem 00:01:35
    5. FlatMap Transformation in Apache Spark 00:04:53
    6. Set Operation in Apache Spark 00:08:01
    7. Solution for the Same Hosts Problem 00:01:37
    8. Actions in Apache Spark 00:08:07
    9. Solution to Sum of Numbers Problem 00:01:47
    10. Important Aspects about RDD 00:01:37
    11. Summary of RDD Operations in Apache Spark 00:02:26
    12. Caching and Persistence in Apache Spark 00:05:15
  3. Chapter 3 : Spark Architecture and Components
    1. Spark Architecture 00:03:01
    2. Spark Components 00:05:26
  4. Chapter 4 : Pair RDD in Apache Spark
    1. Introduction to Pair RDD in Spark 00:01:38
    2. Create Pair RDDs in Spark 00:03:45
    3. Filter and MapValue Transformations on Pair RDD 00:04:57
    4. Reduce By Key Aggregation in Apache Spark 00:05:19
    5. Sample solution for the Average House problem 00:03:20
    6. GroupBy Key Transformation in Spark 00:04:50
    7. SortBy Key Transformation in Spark 00:02:38
    8. Sample Solution for the Sorted Word Count Problem 00:02:09
    9. Data Partitioning in Apache Spark 00:04:18
    10. Join Operations in Spark 00:05:02
  5. Chapter 5 : Advanced Spark Topic
    1. Accumulators 00:03:50
    2. Solution to StackOverflow Survey Follow-up Problem 00:01:00
    3. Broadcast Variables 00:06:44
  6. Chapter 6 : Apache Spark SQL
    1. Introduction to Apache Spark SQL 00:03:54
    2. Spark SQL in Action 00:13:28
    3. Spark SQL practice: House Price Problem 00:01:44
    4. Spark SQL Joins 00:06:33
    5. Strongly Typed Dataset 00:07:04
    6. Use Dataset or RDD 00:03:02
    7. Dataset and RDD Conversion 00:02:33
    8. Performance Tuning of Spark SQL 00:02:50
  7. Chapter 7 : Running Spark in a Cluster
    1. Introduction to Running Spark in a Cluster 00:04:15
    2. Package Spark Application and Use spark-submit 00:08:14
    3. Run Spark Application on Amazon EMR (Elastic MapReduce) cluster 00:13:37