images

Adding Structure with Hive

WHAT YOU WILL LEARN IN THIS CHAPTER:

  • Learning How Hive Provides Value in a Hadoop Environment
  • Comparing Hive to a Relational Database
  • Working With Data in Hive
  • Understanding Advanced Options in Hive

This chapter discusses how you can use Hive with Hadoop to get more value out of your big data initiatives. Hive is a component of all major Hadoop distributions, and it is used extensively to provide SQL-like functionality from a Hadoop installation. For example, Hive is often used to enable common data warehouse scenarios on top of data stored in Hadoop. An example of this would be retrieving a summary of sales by store, and by department. Using MapReduce to prepare and produce these results would take multiple lines of Java code. By using Hive, you can write a familiar SQL query to get the same results:

SELECT Store, Department, SUM(SalesAmount)
FROM StoreSales
GROUP BY Store, Department

If you are familiar with SQL Server, or other relational databases, portions of Hive will seem very familiar. Other aspects of Hive, however, may feel very different or restrictive compared to a relational database. It's important to remember that Hive attempts to bridge some of the gap between Hadoop Distributed File System (HDFS) data store and the relational world, while providing some of the benefits of both technologies. By keeping that perspective, you'll find ...

Get Microsoft Big Data Solutions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.