Master Big Data Ingestion and Analytics with Flume, Sqoop, Hive and Spark

Video description

Complete course on Sqoop, Flume, and Hive: Great for CCA175 and Hortonworks Spark Certification preparation

About This Video

  • Learn Sqoop, Flume, and Hive and successfully achieve CCA175 and Hortonworks Spark Certification
  • Understand the Hadoop Distributed File System (HDFS), along with exploring Hadoop commands to work effectively with HDFS

In Detail

In this course, you will start by learning about the Hadoop Distributed File System (HDFS) and the most common Hadoop commands required to work with HDFS. Next, you'll be introduced to Sqoop Import, which will help you gain insights into the lifecycle of the Sqoop command and how to use the import command to migrate data from MySQL to HDFS, and from MySQL to Hive.

In addition to this, you will get up to speed with Sqoop Export for migrating data effectively, along with using Apache Flume to ingest data. As you progress, you will delve into Apache Hive, external and managed tables, working with different files, and Parquet and Avro. Toward the concluding section, you will focus on Spark DataFrames and Spark SQL.

By the end of this course, you will have gained comprehensive insights into big data ingestion and analytics with Flume, Sqoop, Hive, and Spark.

Publisher resources

Download Example Code

Table of contents

  1. Chapter 1 : Hadoop Introduction
    1. HDFS and Hadoop Commands 00:09:16
  2. Chapter 2 : Sqoop Import
    1. Sqoop Introduction 00:08:07
    2. Managing Target Directories 00:02:38
    3. Working with Different File Formats 00:04:26
    4. Working with Different Compressions 00:06:17
    5. Conditional Imports 00:04:26
    6. Split-by and Boundary Queries 00:08:27
    7. Field delimeters 00:03:18
    8. Incremental Appends 00:03:13
    9. Sqoop Hive Import 00:03:32
    10. Sqoop List Tables/Database 00:02:45
    11. Sqoop Import Practice1 00:04:57
    12. Sqoop Import Practice2 00:04:17
    13. Sqoop Import Practice3 00:03:32
  3. Chapter 3 : Sqoop Export
    1. Export from Hdfs to Mysql 00:03:39
    2. Export from Hive to Mysql 00:02:30
  4. Chapter 4 : Apache Flume
    1. Flume Introduction & Architecture 00:02:32
    2. Exec Source and Logger Sink 00:03:42
    3. Moving data from Twitter to HDFS 00:09:25
    4. Moving data from NetCat to HDFS 00:04:40
    5. Flume Interceptors 00:01:56
    6. Flume Interceptor Example 00:04:54
    7. Flume Multi-Agent Flow 00:06:49
    8. Flume Consolidation 00:06:11
  5. Chapter 5 : Apache Hive
    1. Hive Introduction 00:03:42
    2. Hive Database 00:03:05
    3. Hive Managed Tables 00:06:24
    4. Hive External Tables 00:02:27
    5. Hive Inserts 00:05:31
    6. Hive Analytics 00:04:22
    7. Working with Parquet 00:03:25
    8. Compressing Parquet 00:04:27
    9. Working with Fixed File Format 00:03:04
    10. Alter Command 00:06:13
    11. Hive String Functions 00:06:22
    12. Hive Date Functions 00:05:41
    13. Hive Partitioning 00:07:26
    14. Hive Bucketing 00:03:46
  6. Chapter 6 : Spark Introduction
    1. Spark Introduction 00:03:46
    2. Resilient Distributed Datasets 00:02:53
    3. Cluster Overview 00:06:52
    4. Directed Acyclic Graph (DAG) & Stages 00:10:07
  7. Chapter 7 : Spark Transformations & Actions
    1. Map/FlatMap Transformation 00:04:29
    2. Filter/Intersection 00:04:01
    3. Union/Distinct Transformation 00:02:23
    4. GroupByKey/ Group people based on Birthday months 00:05:54
    5. ReduceByKey / Total Number of students in each Subject 00:06:44
    6. SortByKey / Sort students based on their rollno 00:06:03
    7. MapPartition / MapPartitionWithIndex 00:06:20
    8. Change number of Partitions 00:03:34
    9. Join / Join email address based on customer name 00:03:06
    10. Spark Actions 00:06:06
  8. Chapter 8 : Spark RDD Practice
    1. Scala Tuples 00:03:05
    2. Extract Error Logs from log files 00:10:23
    3. Frequency of word in Text File 00:08:35
    4. Population of each City 00:03:53
    5. Orders placed by Customers 00:09:21
    6. Movie Average Rating greater than 3 00:07:04
  9. Chapter 9 : Spark Dataframes & Spark SQL
    1. Dataframe Intro 00:02:17
    2. Dafaframe from Json Files 00:04:46
    3. Dataframe from Parquet Files 00:01:41
    4. Dataframe from CSV Files 00:08:05
    5. Dataframe from Avro/XML Files 00:04:54
    6. Working with Different Compressions 00:06:34
    7. DataFrame API Part1 00:04:51
    8. DataFrame API Part2 00:06:24
    9. Spark SQL 00:01:33
    10. Working with Hive Tables in Spark 00:01:29

Product information

  • Title: Master Big Data Ingestion and Analytics with Flume, Sqoop, Hive and Spark
  • Author(s): Navdeep Kaur
  • Release date: July 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781839212734