Taming Big Data with Apache Spark and Python - Hands On!

Video Description

More than 15 hands-on examples to help you analyze large data sets with Apache Spark

About This Video

  • Understand how Spark can be distributed across computing clusters

  • Develop and run Spark jobs efficiently using Python

  • A hands-on tutorial with over 15 real-world examples teaching you Big Data processing with Spark

  • In Detail

    Apache Spark has emerged as the next big thing in the Big Data domain – quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis. This course will be your companion to learn Apache Spark in a hands-on manner. Start with understanding how to set up Spark on a single system or on a cluster. From analyzing large data sets using Spark RDD, to developing and running effective Spark jobs quickly using Python, this course will teach you everything. Packed with over 15 interactive, fun-filled examples relevant to the real-world, the course will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease.

    Table of Contents

    1. Chapter 1 : Getting Started with Spark
      1. Introduction 00:02:16
      2. How to Use This Course 00:01:41
      3. Getting Set Up – Installing Python, a JDK, Spark, and its Dependencies 00:14:53
      4. Installing the MovieLens Movie Rating Dataset 00:03:35
      5. Run Your First Spark Program – Ratings Histogram Example 00:04:53
    2. Chapter 2 : Spark Basics and Simple Examples
      1. Introduction to Spark 00:10:12
      2. The Resilient Distributed Dataset (RDD) Z 00:12:17
      3. Ratings Histogram Walkthrough 00:13:34
      4. Key/Value RDDs and the Average Friends by Age Example 00:16:13
      5. Running the Average Friends by Age Example 00:05:39
      6. Filtering RDDs and the Minimum Temperature by Location Example 00:08:10
      7. Running the Minimum Temperature Example and Modifying It for Maximums 00:05:09
      8. Running the Maximum Temperature by Location Example 00:03:22
      9. Counting Word Occurrences Using flatmap() 00:07:28
      10. Improving the Word Count Script with Regular Expressions 00:04:45
      11. Sorting the Word Count Results 00:07:45
      12. Find the Total Amount Spent by Customer 00:04:01
      13. Check Your Results and Sort Them by Total Amount Spent 00:05:08
      14. Check Your Sorted Implementation and Results Against Mine 00:03:19
    3. Chapter 3 : Advanced Examples of Spark Programs
      1. Find the Most Popular Movie 00:05:53
      2. Use Broadcast Variables to Display Movie Names Instead of ID Numbers 00:08:24
      3. Find the Most Popular Superhero in a Social Graph 00:04:29
      4. Run the Script – Discover Who the Most Popular Superhero is! 00:06:00
      5. Superhero Degrees of Separation – Introducing Breadth-First Search 00:07:54
      6. Superhero Degrees of Separation – Accumulators and Implementing BFS in Spark 00:06:45
      7. Superhero Degrees of Separation – Review the Code and Run it 00:09:14
      8. Item-Based Collaborative Filtering in Spark, cache(), and persist() 00:10:13
      9. Running the Similar Movies Script Using Spark's Cluster Manager 00:10:55
      10. Improve the Quality of Similar Movies 00:02:58
    4. Chapter 4 : Running Spark on a Cluster
      1. Introducing Elastic MapReduce 00:05:08
      2. Setting Up Your AWS / Elastic MapReduce Account and PuTTY 00:09:56
      3. Partitioning 00:04:22
      4. Create Similar Movies from One Million Ratings – Part 1 00:05:12
      5. Create Similar Movies from One Million Ratings – Part 2 00:11:28
      6. Create Similar Movies from One Million Ratings – Part 3 00:03:29
      7. Troubleshooting Spark on a Cluster 00:03:43
      8. More Troubleshooting and Managing Dependencies 00:05:48
    5. Chapter 5 : SparkSQL, DataFrames, and DataSets
      1. Introducing SparkSQL 00:06:08
      2. Executing SQL Commands and SQL-Style Functions on a DataFrame 00:08:17
      3. Using DataFrames Instead of RDDs 00:05:53
    6. Chapter 6 : Other Spark Technologies and Libraries
      1. Introducing MLLib 00:08:10
      2. Using MLLib to Produce Movie Recommendations 00:02:57
      3. Analyzing the ALS Recommendations Results 00:04:53
      4. Using DataFrames with MLLib 00:07:32
      5. Spark Streaming and GraphX 00:07:36
    7. Chapter 7 : You Made It! Where to Go from Here
      1. Learning More about Spark and Data Science 00:04:09

    Product Information

    • Title: Taming Big Data with Apache Spark and Python - Hands On!
    • Author(s): Frank Kane
    • Release date: September 2016
    • Publisher(s): Packt Publishing
    • ISBN: 9781787129931