O'Reilly logo

Hadoop 2.x Administration Cookbook by Gurmukh Singh

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 7. Data Ingestion and Workflow

In this chapter, we will cover the following topics:

  • Hive server modes and setup
  • Using MySQL for Hive metastore
  • Operating Hive with ZooKeeper
  • Loading data into Hive
  • Partitioning and Bucketing in Hive
  • Hive metastore database
  • Designing Hive with credential store
  • Configuring Flume
  • Configure Oozie and workflows

Introduction

Firstly, let us understand what Apache Hive is. Apache Hive is a data warehousing infrastructure built on top of Hadoop that queries the data using SQL. The goal of Hive was to help existing SQL users quickly transition to Hadoop in dealing with structured data, without worrying about the complexities of the Hadoop framework.

In this chapter, we will configure the various methods of data ingestion. Most ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required