Apache Spark with Java - Learn Spark from a Big Data Guru

Video description

Learn to analyze large data sets with Apache Spark by 10+ hands-on examples. Take your big data skills to the next level

About This Video

  • You will gain an in-depth knowledge of Spark, general big data analysis, and data manipulation skills.
  • You'll be able to develop Spark application that analyzes gigabytes of data both on your laptop, and in the cloud using Amazon's Elastic MapReduce service.

In Detail

This course covers all the fundamentals of Apache Spark with Java and teaches you everything you need to know about developing Spark applications with Java. At the end of this course, you will have gained an in-depth knowledge pf Apache Spark, general big data analysis and manipulations skills. With these new skills you'll be able to help your company to adapt Apache Spark for building a big data processing pipeline and data analytics applications. This course covers 10+ hands-on big data examples. You will learn valuable knowledge on how to frame data analysis problems as Spark problems. Together we will learn examples such as aggregating NASA Apache web logs from different sources; we will explore the price trend by looking at the real estate data in California; we will write Spark applications to find out the median salary of developers in different countries through the Stack Overflow survey data; we will develop a system to analyze how maker spaces are distributed across different regions in the United Kingdom, and much more.

Publisher resources

Download Example Code

Table of contents

  1. Chapter 1 : Get Started with Apache Spark
    1. Course Overview 00:04:08
    2. Introduction to Spark 00:02:21
    3. Install Java and Git 00:04:12
    4. Set up Spark project with IntelliJ IDEA 00:07:23
    5. Set up Spark project with Eclipse 00:02:04
    6. Run our first Spark job 00:02:44
  2. Chapter 2 : RDD
    1. RDD Basics 00:02:40
    2. Create RDDs 00:02:26
    3. Map and Filter Transformation 00:08:46
    4. Solution to Airports by Latitude Problem 00:01:31
    5. FlatMap Transformation 00:06:27
    6. Set Operation 00:07:37
    7. Actions 00:08:06
    8. Solution to Sum of Numbers Problem 00:01:44
    9. Important Aspects about RDD 00:01:37
    10. Summary of RDD Operations 00:02:24
    11. Caching and Persistence 00:05:09
  3. Chapter 3 : Spark Architecture and Components
    1. Spark Architecture 00:02:56
    2. Spark Components 00:05:21
  4. Chapter 4 : Pair RDD
    1. Introduction to Pair RDD 00:01:33
    2. Create Pair RDDs 00:03:54
    3. Filter and MapValue Transformations on Pair RDD 00:04:53
    4. Reduce By Key Aggregation 00:05:15
    5. Sample solution for the Average House problem 00:03:16
    6. Group by Key Transformation 00:04:43
    7. Sort by Key Transformation 00:02:49
    8. Sample Solution for the Sorted Word Count Problem 00:02:01
    9. Data Partitioning 00:04:13
    10. Join Operations 00:04:56
  5. Chapter 5 : Advanced Spark Topic
    1. Accumulators 00:05:31
    2. Solution to StackOverflow Survey Follow-up Problem 00:01:21
    3. Broadcast Variables 00:06:48
  6. Chapter 6 : Spark SQL
    1. Introduction to Spark SQL 00:03:49
    2. Spark SQL in Action 00:14:43
    3. Spark SQL practice: House Price Problem 00:01:53
    4. Spark SQL Joins 00:06:21
    5. Strongly Typed Dataset 00:08:32
    6. Use Dataset or RDD 00:02:58
    7. Dataset and RDD Conversion 00:02:58
    8. Performance Tuning of Spark SQL 00:02:44
  7. Chapter 7 : Running Spark in a Cluster
    1. Introduction to Running Spark in a Cluster 00:04:09
    2. Package Spark Application and Use spark-submit 00:08:08
    3. Run Spark Application on Amazon EMR (Elastic MapReduce) cluster 00:13:32

Product information

  • Title: Apache Spark with Java - Learn Spark from a Big Data Guru
  • Author(s): Tao W., James Lee
  • Release date: April 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781788994330