O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hands-On Big Data Processing with Hadoop 3

Video Description

Perform real-time data analytics, stream and batch processing on your application using Hadoop

About This Video

  • Get a clear understanding of the storage paradigm of Hadoop.
  • Understanding of data Processing with various schemas like structured unstructured and semi structured data.
  • Learn data movement from various sources like RDBMS, Web log server, Syslog server, social media and other sources.

In Detail

Hadoop which is one of the best open-source software frameworks for distributed computing. It provides you with means to ramp up your career and skills. You will start out by learning the basics of Hadoop, including its file system HDFS, and its cluster management resource YARN and its many libraries and programming tools. This course will get you started with the Hadoop major components which Industry demands. You will be able to see how the structure, unstructured and semi structured data can be processed with Hadoop.

This course will majorly focus on the problem faced in Big Data and the solution offered by respective Hadoop component. You will learn to use different components and tools such as Mapreduce to process raw data and will learn how tools such as Hive and Pig aids in this process. You will then move on to Data Analysis techniques with Hadoop using tools such as Hive and will learn to apply them in a real world Big Data Application. This course will teach you to perform real-time data analytics, stream and batch processing on your application. Finally, this course will also teach you how to extend your analytics solutions to the cloud.

The codes of this course are placed on Github: https://github.com/PacktPublishing/Hands-on-Big-Data-Processing-with-Hadoop-3

Downloading the example code for this course: You can download the example code files for all Packt video courses you have purchased from your account at http://www.PacktPub.com. If you purchased this course elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Chapter 1 : What Is Hadoop?
    1. The Course Overview 00:03:44
    2. Introduction to Hadoop 00:06:16
    3. Introduction to Hadoop Distributed File System 00:03:27
    4. HDFS Architecture and Features 00:11:50
    5. Replication and Rack Awareness 00:03:31
    6. Anatomy of a File Read/Write on HDFS 00:07:42
  2. Chapter 2 : Making Hadoop Efficient – YARN Architecture
    1. The Rise of Resource Manager 00:13:45
    2. YARN Architecture 00:04:51
    3. How YARN Has Effectively Increased the Potential of Hadoop 00:04:55
    4. Classic versus YARN 00:02:52
    5. YARN Daemons 00:02:42
    6. Containers 00:03:35
    7. Speculative Execution 00:02:38
    8. HDFS Federation 00:02:46
    9. Authentication and High Availability 00:03:27
    10. Understanding the Major Changes in Different Versions of Hadoop – 1.X, 2.X, and 3.X 00:06:03
  3. Chapter 3 : Analyze Data with MapReduce Basics
    1. What Is MapReduce? 00:07:00
    2. MapReduce Framework, Architecture, and Use Cases 00:07:11
    3. Input Splits 00:06:41
    4. Getting Hands-on Practical with Hadoop 00:17:36
  4. Chapter 4 : Analyzing Structured Data with Hadoop
    1. Why We Need to Analyze Data with Hive? 00:04:08
    2. What Is Hive? 00:03:04
    3. Hive Architecture 00:05:50
    4. Warehouse Directory and Metastore 00:03:48
    5. Hive Query Language 00:13:44
    6. Managed and External Tables 00:11:44
  5. Chapter 5 : Efficient Data Transfer with Sqoop
    1. How Are We Going to Learn? 00:03:53
    2. Importing Data from RDBMS to HDFS 00:08:01
    3. Exporting Data from HDFS to RDBMS 00:05:15
  6. Chapter 6 : Managing Data Collection and Transfer with Flume
    1. What Is Flume? 00:03:13
    2. Flume Architecture 00:04:38
    3. Preparing Flume for Fetching the Data 00:04:29
    4. Fetch the Data from Twitter in HDFS 00:07:09
  7. Chapter 7 : Perform Data Execution with Pig
    1. Pig Background 00:04:46
    2. Pig Architecture 00:03:15
    3. Pig Latin Basics 00:06:05
    4. Pig Execution Model 00:03:39
    5. Pig Processing – Loading and Transforming Data 00:11:53
    6. Pig Built-in Functions 00:13:35
    7. Filtering, Grouping, and Sorting Data 00:15:59
    8. Relational Join Operators 00:16:14