Skip to main content

Get full access to Using Spark in the Hadoop Ecosystem and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Using Spark in the Hadoop Ecosystem

Using Spark in the Hadoop Ecosystem

by Rich Morrow

Released June 2016

Publisher(s): Infinite Skills

ISBN: 9781771375658

Start your free trial

Video description

You're new to Big Data, you've heard about Apache Spark and Apache Hadoop and you want to play. Big Data coach Rich Morrow gets you into the game via sixteen sprints (sixteen hands-on labs) across the Spark-Hadoop ball field. First, you'll create playing areas using Amazon Web Services EMR and Cloudera Quickstart VM. Then you'll install Hadoop, run basic HDFS commands, learn MapReduce, use Flume and Sqoop, run Spark and then run Spark again.

You'll play with Spark SQL, learn common MLLib usage, do analysis with Hive, ETL with Pig, and then jog through Hadoop/Cloud use cases, Hbase basics, and enterprise integration. When practice is over, you'll know Spark, it's associated modules, the Hadoop ecosystem, and the when, where, how, and why each technology is used. Working files are included, allowing you to follow along with the author throughout the lessons. Play on.

Understand Apache Spark and why it's Big Data's fastest growing open source project
Learn what Apache Hadoop is and how it's used in the world of Big Data
Master the basics of Hadoop - HDFS, YARN and MapReduce
Master the basics of Spark- Spark SQL, MLlib, Spark Streaming, Graphx and more
Discover Sqoop, Flume, Hive, Pig, HBase, and Oozie - key components of Hadoop
Gain direct experience with Spark and Hadoop with sixteen hands-on labs

Rich Morrow is a 20+ year veteran of IT and an expert in big data and cloud technologies. He's used Hadoop and AWS for over 6 years in his consulting practice, quicloud.com, and has taught Cloudera (Hadoop) and AWS for Global Knowledge (where he also serves as Course Director for Cloud and Big Data) for over 4 years. Rich retains all certifications for both AWS and Cloudera, and is a prolific writer and speaker on Cloud, Big Data, DevOps/Agile, Mobile, and IoT topics, including the O'Reilly titles Hands-on with Amazon Redshift, Learning Apache Hadoop and Cloud Computing With AWS.

Table of contents

Introduction
1. Course Introduction 00:04:21
2. About The Author 00:04:14
3. What Is Big Data 00:11:07
4. Historical Approaches 00:07:04
5. Modern-Day Approach 00:12:42
6. What Is Hadoop 00:11:05
7. Hadoop Core Vs Ecosystem 00:05:03
8. Hadoopable Problems 00:06:37
Hadoop Basics
1. HDFS And Yarn 00:08:14
2. Hive And Pig Interface Introduction 00:05:59
3. Introduction To Spark 00:04:37
4. Hadoop In The Cloud (Amazon Web Services Intro) 00:08:49
5. Installing Hadoop Into EMR Part - 1 00:15:31
6. Installing Hadoop Into EMR Part - 2 00:15:34
7. Installing Cloudera Quickstart VM 00:11:01
8. Web GUIs 00:11:06
Hadoop Distributed Filesystem (HDFS)
1. HDFS Architecture 00:10:05
2. HDFS File Write Walkthrough 00:17:57
3. Secondary Name Node 00:06:38
4. Basic HDFS Commands 00:09:23
5. Using HDFS Commands Part - 1 00:07:34
6. Using HDFS Commands Part - 2 00:09:27
7. HA And Federation Basics 00:12:48
8. HDFS Access Controls (Or Lack Thereof) 00:09:34
Yarn
1. Yarn Purpose 00:06:16
2. Yarn Architecture 00:07:25
3. Yarn With Spark 00:06:44
MapReduce
1. MapReduce Explained 00:11:52
2. MapReduce Architecture 00:07:36
3. MapReduce Code Walkthrough 00:11:59
4. MapReduce Details Walkthrough 00:04:45
5. Running MapReduce Job 00:08:59
HDFS Data Import And Export
1. Import/Export Options 00:11:12
2. Flume Introduction 00:10:53
3. Using Flume 00:13:43
4. Sqoop Introduction 00:09:25
5. Using Sqoop 00:17:01
6. HDFS Interaction Tools 00:06:01
7. Oozie Introduction 00:10:17
Spark Basics
1. Spark Value Propositions 00:08:30
2. Spark Run Modes (Yarn, Standalone, Mesos) 00:07:33
3. RDDs And Dataframes 00:17:24
4. Hands On Spark Part - 1 00:08:12
5. Hands On Spark Part - 2 00:10:38
6. Running Spark Part - 1 00:09:58
7. Running Spark Part - 2 00:13:55
8. Optimizing And Debugging Spark 00:18:17
9. Spark Libraries Overview 00:09:05
Spark Built-In Libraries
1. Spark SQL 00:09:01
2. Spark SQL Usage 00:12:02
3. MLlib Basics 00:15:30
4. Common MLlib Usage Part - 1 00:15:02
5. Common MLlib Usage Part - 2 00:08:23
6. Spark Streaming 00:12:43
7. GraphX 00:09:58
Hive And Pig
1. Hive Vs Pig 00:09:53
2. Hive Basics 00:11:53
3. Analysis With Hive 00:10:54
4. Pig Basics 00:14:38
5. ETL And Analytics With Pig 00:20:16
Hadoop In The Cloud
1. Hadoop/Cloud Use Cases 00:05:16
2. Elastic MapReduce (EMR) 00:12:47
Ecosystem
1. HBase Basics 00:11:16
2. Enterprise Integration 00:10:39
Wrap Up
1. Wrap Up 00:03:41

Product information

Title: Using Spark in the Hadoop Ecosystem
Author(s): Rich Morrow
Release date: June 2016
Publisher(s): Infinite Skills
ISBN: 9781771375658

You might also like

video

Learning Apache Hadoop

by Rich Morrow

In this Introduction to Hadoop training course, expert author Rich Morrow will teach you the tools …

video

CCA 159: Expert in Big Data Analytics - Advance Hive and Sqoop

by Navdeep Kaur

This course will help you understand Hive, along with preparing you to achieve CCA159 (Cloudera Big …

video

Data Analytics Using Spark and Hadoop

by Sujee Maniyam

Hadoop and Spark are the stars of the Big Data world. This course covers the basics …

video

Debugging Apache Spark

by Holden Karau

Apache Spark is an extremely powerful general purpose distributed system that also happens to be extremely …

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now