Video description
Hadoop is today's most pervasive technology used in Big Data for distributing the processing of massive data sets across clusters of commodity computers. With Amazon's Elastic MapReduce service (EMR), you can rent capacity through Amazon Web Services (AWS) to store and analyze data at minimal cost on top of a real Hadoop cluster.
This course shows you how to use an EMR Hadoop cluster via a real life example where you'll analyze movie ratings data using Hive, Pig, and Oozie. It focuses on practical tips for using an EMR cluster efficiently, integrating the cluster with Amazon's S3 service, and determining the right money-saving size for a cluster. You'll learn how to interact with your cluster through the Hue Web interface, from a terminal prompt, as well as through EMR steps that can execute your scripts automatically.
- Gain experience with three high value skill sets: Hadoop, AWS, and EMR
- Save time and money by learning about the undocumented "gotchas" of AWS and EMR
- See how the experts provision EMR clusters and connect to them via SSH and web UIs
- Learn to import data into a cluster and to access external data stored on Amazon's S3
- Explore three different ways to query data using Hive and Pig
- Discover the Tez engine and see how it accelerates Hive and Pig queries
- Learn how to schedule workflows using Oozie
Table of contents
-
Introduction
- Welcome To The Course 00:02:37
- About The Author 00:01:49
-
Introducing Elastic MapReduce
- Overview Of EMR 00:04:29
- Tez VS MapReduce 00:02:18
- Launching An EMR Cluster 00:07:47
- Connecting To Your Cluster 00:04:27
-
Using Hadoop On EMR
- Create A Tunnel For Web UI's 00:04:35
- Use Hue To Interact With EMR 00:05:33
- Analyze Movie Ratings With Hive On EMR 00:09:30
- Analyze Movie Ratings With Pig On EMR 00:08:00
-
Managing Your EMR Cluster
- Using Oozie To Schedule Workflows 00:04:21
- Monitoring Your EMR Cluster 00:04:29
-
Conclusion
- Wrap Up And Thank You 00:02:48
Product information
- Title: Analyzing Big Data with Hadoop, AWS, and EMR
- Author(s):
- Release date: March 2017
- Publisher(s): Infinite Skills
- ISBN: 9781491985137
You might also like
video
Analyzing Big Data with Spark and Amazon EMR
You're a software developer somewhat familiar with Apache Spark and how it's used to analyze Big …
video
Getting Started with Cloud Pipelines with AWS Lambda and AWS Step Functions
Learn to build cloud pipelines with AWS Step Functions 00:00 Intro 01:00 Create AWS Lambda function …
video
AWS MasterClass: DevOps w/ AWS Command Line Interface (CLI)
In this course, we will focus on implementing DevOps practices in the cloud using an aggregation …
video
Hadoop and Spark Fundamentals
9+ Hours of Video Instruction The perfect (and fast) way to get started with Hadoop and …