Skip to Content
Programming Hive
book

Programming Hive

by Edward Capriolo, Dean Wampler, Jason Rutherglen
September 2012
Intermediate to advanced content levelIntermediate to advanced
350 pages
9h 46m
English
O'Reilly Media, Inc.
Content preview from Programming Hive

Chapter 1. Introduction

From the early days of the Internet’s mainstream breakout, the major search engines and ecommerce companies wrestled with ever-growing quantities of data. More recently, social networking sites experienced the same problem. Today, many organizations realize that the data they gather is a valuable resource for understanding their customers, the performance of their business in the marketplace, and the effectiveness of their infrastructure.

The Hadoop ecosystem emerged as a cost-effective way of working with such large data sets. It imposes a particular programming model, called MapReduce, for breaking up computation tasks into units that can be distributed around a cluster of commodity, server class hardware, thereby providing cost-effective, horizontal scalability. Underneath this computation model is a distributed file system called the Hadoop Distributed Filesystem (HDFS). Although the filesystem is “pluggable,” there are now several commercial and open source alternatives.

However, a challenge remains; how do you move an existing data infrastructure to Hadoop, when that infrastructure is based on traditional relational databases and the Structured Query Language (SQL)? What about the large base of SQL users, both expert database designers and administrators, as well as casual users who use SQL to extract information from their data warehouses?

This is where Hive comes in. Hive provides an SQL dialect, called Hive Query Language (abbreviated HiveQL or just ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Engineering Foundations LiveLessons Part 1: Using Spark, Hive, and Hadoop Scalable Tools

Data Engineering Foundations LiveLessons Part 1: Using Spark, Hive, and Hadoop Scalable Tools

Doug Eadline
Learning Spark SQL

Learning Spark SQL

Aurobindo Sarkar

Publisher Resources

ISBN: 9781449326944Errata PageSupplemental Content