on-demand course

The Ultimate Hands-On Hadoop

with Frank Kane

June 2017

Beginner to intermediate

14h 39m

English

Packt Publishing

Closed Captioning available in English

Watch now

Unlock full access

Includes

Earns Badge

Course outline

Introduction and Installation of Hadoop
19m 3s
The Hortonworks and Cloudera Merger and its Effects on the Course
3m 4s
Hadoop Overview and History
7m 45s
Overview of the Hadoop Ecosystem
16m 47s
Hadoop Distributed File System (HDFS): What it is and How it Works
13m 54s
Installing the MovieLens Dataset
6m 20s
Activity - Installing the MovieLens Dataset into Hadoop's Distributed File System (HDFS) using the Command Line
7m 51s
MapReduce: What it is and How it Works
10m 40s
How MapReduce Distributes Processing
12m 57s
MapReduce Example: Breaking Down the Movie Ratings by Rating Score
11m 36s
Activity - Installing Python, MRJob, and Nano
13m 22s
Activity - Coding Up and Running the Ratings Histogram MapReduce Job
7m 36s
Exercise - Ranking Movies by Their Popularity
7m 7s
Activity - Checking Results
8m 24s
Introducing Ambari
9m 50s
Introducing the Pig
6m 26s
Example - Finding the Oldest Movie with Five-Star Rating Using the Pig
15m 8s
Activity - Finding the Old Five-Star Movies with Pig
9m 40s
More Pig Latin
7m 34s
Exercise - Finding the Most-Rated One-Star Movie
1m 56s
Pig Challenge - Comparing Results
5m 37s
Why Spark?
10m 7s
The Resilient Distributed Datasets (RDD)
10m 14s
Activity - Finding the Movie with the Lowest Average Rating with the Resilient Distributed Datasets (RDD)
15m 34s
Datasets and Spark 2.0
6m 28s
Activity - Finding the movie with the Lowest Average Rating with DataFrames
10m 1s
Activity - Recommending a Movie with Spark's Machine Learning Library (MLLib)
12m 16s
Exercise - Filtering the Lowest-Rated Movies by Number of Ratings
2m 51s
Activity - Checking Results
6m 40s
What is Hive?
6m 32s
Activity - Using Hive to Find the Most Popular Movie
10m 46s
How Hive Works?
9m 11s
Exercise - Using Hive to Find the Movie with the Highest Average Rating
1m 56s
Comparing Solutions
4m 11s
Integrating MySQL with Hadoop
8m 0s
Activity - Installing MySQL and Importing Movie Data
7m 46s
Activity - Using Sqoop to Import Data from MySQL to HFDS/Hive
7m 4s
Activity - Using Sqoop to Export Data from Hadoop to MySQL
7m 19s
Why NoSQL?
13m 55s
What is HBase?
12m 55s
Activity - Importing Movie Ratings into HBase
13m 29s
Activity - Using HBase with Pig to Import Data at Scale
11m 20s
Cassandra - Overview
14m 51s
Activity - Installing Cassandra
10m 56s
Activity - Writing Spark Output into Cassandra
11m 3s
MongoDB - Overview
17m 20s
Activity - Installing MongoDB and Integrating Spark with MongoDB
12m 48s
Activity - Using the MongoDB Shell
7m 51s
Choosing Database Technology
15m 59s
Exercise - Choosing a Database for a Given Problem
5m 0s
Overview of Drill
7m 56s
Activity - Setting Up Drill
10m 58s
Activity - Querying Across Multiple Databases with Drill
7m 7s
Overview of Phoenix
8m 56s
Activity - Installing Phoenix and Querying HBase
7m 6s
Activity - Integrating Phoenix with the Pig
11m 46s
Overview of Presto
6m 40s
Activity - Installing Presto and Querying Hive
12m 27s
Activity - Querying Both Cassandra and Hive Using Presto
9m 1s
Yet Another Resource Negotiator (YARN)
10m 2s
Tez
4m 56s
Activity - Using Hive on Tez and Measuring the Performance Benefit
8m 36s
Mesos
7m 14s
ZooKeeper
13m 11s
Activity - Simulating a Failing Master with ZooKeeper
6m 48s
Oozie
11m 56s
Activity - Setting Up a Simple Oozie Workflow
16m 57s
Zeppelin - Overview
5m 1s
Activity - Using Zeppelin to Analyze Movie Ratings - Part 1
12m 28s
Activity - Using Zeppelin to Analyze Movie Ratings - Part 2
9m 46s
Hue - Overview
8m 8s
Other Technologies Worth Mentioning
4m 35s
Kafka
9m 48s
Activity - Setting Up Kafka and Publishing Data
7m 27s
Activity - Publishing Web Logs with Kafka
10m 24s
Flume
10m 16s
Activity - Setting up Flume and Publishing Logs
7m 49s
Activity - Setting Up Flume to Monitor a Directory and Store its Data in Hadoop Distributed File System (HDFS)
9m 12s
Spark Streaming: Introduction
14m 28s
Activity - Analyzing Web Logs Published with Flume using Spark Streaming
14m 24s
Exercise - Monitor Flume-Published Logs for Errors in Real Time
2m 2s
Exercise Solution: Aggregating the Hypertext Transfer Protocol (HTTP) Access Codes with Spark Streaming
4m 25s
Apache Storm: Introduction
9m 28s
Activity - Counting Words with Storm
15m 53s
Flink: Overview
6m 53s
Activity - Counting Words with Flink
10m 23s

Overview

In this 14 hr course, you will explore the essentials of Hadoop and its ecosystem, mastering the design and management of distributed systems to handle big data. Learn with real-world examples how to harness tools like Spark, Flume, and Kafka for contemporary data challenges.

What I will be able to do after this course

Understand and implement the core principles of Hadoop, including HDFS and YARN.
Learn to process big data with scripting tools like Pig and programming libraries in Spark.
Utilize databases like HBase and MongoDB for storing and retrieving non-relational data.
Manage and analyze streaming data with technologies such as Kafka and Spark Streaming.
Gain hands-on experience with advanced Hadoop cluster management and ecosystem components.

Course Instructor(s)

Frank Kane is an experienced instructor with a background in data science and distributed computing, having worked extensively in the tech industry. His teaching style combines his practical experience with clear explanations, making complex technical topics accessible. Learners benefit from his expertise and friendly guidance throughout the course.

Who is it for?

This course caters to technology enthusiasts, ranging from software developers to project managers, looking to deepen their knowledge of big data solutions. Ideal for learners with basic Python or Scala proficiency and familiarity with the Linux command line. Participants aim to enhance their system design capabilities and data handling skills, especially in analyzing large datasets.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Hadoop: The Definitive Guide, 4th Edition

Publisher Resources

ISBN: 9781788478489Supplemental Content

The Ultimate Hands-On Hadoop

with Frank Kane

Chapter 1 : Learning All the Buzzwords and Installing the Hortonworks Data Platform Sandbox

Chapter 2 : Using the Hadoop's Core: Hadoop Distributed File System (HDFS) and MapReduce

Chapter 3 : Programming Hadoop with Pig

Chapter 4 : Programming Hadoop with Spark

Chapter 5 : Using Relational Datastores with Hadoop

Chapter 6 : Using Non-Relational Data Stores with Hadoop

Chapter 7 : Querying Data Interactively

Chapter 8 : Managing Your Cluster

Chapter 9 : Feeding Data to Your Cluster

Chapter 10 : Analyzing Streams of Data

Chapter 11 : Designing Real-World Systems

Chapter 12 : Learning More

Overview

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Hadoop: The Definitive Guide, 4th Edition

Apache Kafka for Absolute Beginners

Data Engineering Foundations LiveLessons Part 1: Using Spark, Hive, and Hadoop Scalable Tools

Kubernetes for the Absolute Beginners - Hands-On

Publisher Resources

Chapter 1 : Learning All the Buzzwords and Installing the Hortonworks Data Platform Sandbox

Chapter 2 : Using the Hadoop's Core: Hadoop Distributed File System (HDFS) and MapReduce

Chapter 3 : Programming Hadoop with Pig

Chapter 4 : Programming Hadoop with Spark

Chapter 5 : Using Relational Datastores with Hadoop

Chapter 6 : Using Non-Relational Data Stores with Hadoop

Chapter 7 : Querying Data Interactively

Chapter 8 : Managing Your Cluster

Chapter 9 : Feeding Data to Your Cluster

Chapter 10 : Analyzing Streams of Data

Chapter 11 : Designing Real-World Systems

Chapter 12 : Learning More

Overview

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Hadoop: The Definitive Guide, 4th Edition

Apache Kafka for Absolute Beginners

Data Engineering Foundations LiveLessons Part 1: Using Spark, Hive, and Hadoop Scalable Tools

Kubernetes for the Absolute Beginners - Hands-On

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.