Chapter 7. Hadoop Ecosystem II – Pig, HBase, Mahout, and Sqoop

In this chapter, we will cover the following topics:

  • Getting started with Apache Pig
  • Joining two datasets using Pig
  • Accessing a Hive table data in Pig using HCatalog
  • Getting started with Apache HBase
  • Data random access using Java client APIs
  • Running MapReduce jobs on HBase
  • Using Hive to insert data into HBase tables
  • Getting started with Apache Mahout
  • Running K-means with Mahout
  • Importing data to HDFS from a relational database using Apache Sqoop
  • Exporting data from HDFS to a relational database using Apache Sqoop

Introduction

Hadoop ecosystem has a family of projects that are either built on top of Hadoop or work very closely with Hadoop. These projects have given rise to an ecosystem that focuses ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.