Chapter 7. Hadoop Ecosystem II – Pig, HBase, Mahout, and Sqoop

In this chapter, we will cover the following topics:

Getting started with Apache Pig
Joining two datasets using Pig
Accessing a Hive table data in Pig using HCatalog
Getting started with Apache HBase
Data random access using Java client APIs
Running MapReduce jobs on HBase
Using Hive to insert data into HBase tables
Getting started with Apache Mahout
Running K-means with Mahout
Importing data to HDFS from a relational database using Apache Sqoop
Exporting data from HDFS to a relational database using Apache Sqoop

Introduction

Hadoop ecosystem has a family of projects that are either built on top of Hadoop or work very closely with Hadoop. These projects have given rise to an ecosystem that focuses ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Chapter 7. Hadoop Ecosystem II – Pig, HBase, Mahout, and Sqoop

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly