Chapter 7. Data Ingestion and Workflow

In this chapter, we will cover the following topics:

Hive server modes and setup
Using MySQL for Hive metastore
Operating Hive with ZooKeeper
Loading data into Hive
Partitioning and Bucketing in Hive
Hive metastore database
Designing Hive with credential store
Configuring Flume
Configure Oozie and workflows

Introduction

Firstly, let us understand what Apache Hive is. Apache Hive is a data warehousing infrastructure built on top of Hadoop that queries the data using SQL. The goal of Hive was to help existing SQL users quickly transition to Hadoop in dealing with structured data, without worrying about the complexities of the Hadoop framework.

In this chapter, we will configure the various methods of data ingestion. Most ...

Get Hadoop 2.x Administration Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop 2.x Administration Cookbook by Gurmukh Singh

Chapter 7. Data Ingestion and Workflow

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly