Video description
In this course, you will start by learning about the Hadoop Distributed File System (HDFS) and the most common Hadoop commands required to work with HDFS. Next, you’ll be introduced to Sqoop Import, which will help you gain insights into the lifecycle of the Sqoop command and how to use the import command to migrate data from MySQL to HDFS, and from MySQL to Hive.
In addition to this, you will get up to speed with Sqoop Export for migrating data effectively, along with using Apache Flume to ingest data. As you progress, you will delve into Apache Hive, external and managed tables, working with different files, and Parquet and Avro. Toward the concluding section, you will focus on Spark DataFrames and Spark SQL.
By the end of this course, you will have gained comprehensive insights into big data ingestion and analytics with Flume, Sqoop, Hive, and Spark.
What You Will Learn
- Explore the Hadoop Distributed File System (HDFS) and commands
- Get to grips with the lifecycle of the Sqoop command
- Use the Sqoop Import command to migrate data from MySQL to HDFS and Hive
- Understand split-by and boundary queries
- Use the incremental mode to migrate data from MySQL to HDFS
- Employ Sqoop Export to migrate data from HDFS to MySQL
- Discover Spark DataFrames and gain insights into working with different file formats and compression
Audience
This course is for anyone who wants to learn Sqoop and Flume or those looking to achieve CCA and HDP certification.
About The Author
Navdeep Kaur: Navdeep Kaur - Technical Trainer
Navdeep Kaur is a big data professionals with 11 years of industry experience in different technologies and domains. She has a keen interest in providing training in new technologies. She has received CCA175 Hadoop and Spark developer certification and AWS solution architect certification. She loves guiding people and helping them achieves new goals.
Publisher resources
Table of contents
- Chapter 1 : Hadoop Introduction
-
Chapter 2 : Sqoop Import
- Sqoop Introduction
- Managing Target Directories
- Working with Different File Formats
- Working with Different Compressions
- Conditional Imports
- Split-by and Boundary Queries
- Field delimeters
- Incremental Appends
- Sqoop Hive Import
- Sqoop List Tables/Database
- Sqoop Import Practice1
- Sqoop Import Practice2
- Sqoop Import Practice3
- Chapter 3 : Sqoop Export
- Chapter 4 : Apache Flume
- Chapter 5 : Apache Hive
- Chapter 6 : Spark Introduction
-
Chapter 7 : Spark Transformations Actions
- Map/FlatMap Transformation
- Filter/Intersection
- Union/Distinct Transformation
- GroupByKey/ Group people based on Birthday months
- ReduceByKey / Total Number of students in each Subject
- SortByKey / Sort students based on their rollno
- MapPartition / MapPartitionWithIndex
- Change number of Partitions
- Join / Join email address based on customer name
- Spark Actions
- Chapter 8 : Spark RDD Practice
- Chapter 9 : Spark Dataframes Spark SQL
Product information
- Title: Master Big Data Ingestion and Analytics with Flume, Sqoop, Hive and Spark
- Author(s):
- Release date: July 2019
- Publisher(s): Packt Publishing
- ISBN: 9781839212734
You might also like
video
From 0 to 1: Hive for Processing Big Data
End-to-End Hive: HQL, Partitioning, Bucketing, UDFs, Windowing, Optimization, Map Joins, Indexes About This Video Analytical Processing: …
video
Apache Spark with Python - Big Data with PySpark and Spark
This course covers all the fundamentals of Apache Spark with Python and teaches you everything you …
video
Apache Kafka Series - Kafka Streams for Data Processing
The new volume in the Apache Kafka Series! Learn the Kafka Streams data-processing library, for Apache …
video
Data Engineering Foundations LiveLessons Part 1: Using Spark, Hive, and Hadoop Scalable Tools
6+ Hours of Video Instruction One Line Sell The perfect way to get started with scalable …