images

Expanding Your Capability with HBase and HCatalog

WHAT YOU WILL LEARN IN THIS CHAPTER:

  • Knowing When to Use HBase
  • Creating HBase Tables
  • Loading Data into an HBase Table
  • Performing a Fast Lookup with HBase
  • Defining Data Structures in HCatalog
  • Creating Indexes and Partitions on HCatalog Tables
  • Integrating HCatalog with Pig and Hive

This chapter looks at two tools that you can use to create structure on top of your big data stored in the Hadoop Distributed File System (HDFS): HBase and HCatalog. HBase is a tool that creates key/value tuples on top of the data and stores the key values in a columnar storage structure. HBase ensures fast lookups and enables consistency when updating the data. It supports huge update rates while providing almost instant access to the updated data. For example, you might use HBase to record and analyze streaming data from sensors providing near real-time agile predictive analytics.

The other tool, HCatalog, provides a relational table abstraction layer over HDFS. Using the HCatalog abstraction layer allows query tools such as Pig and Hive to treat the data in a familiar relational architecture. It also permits easier exchange of data between the HDFS storage and client tools used to present the data for analysis using familiar data exchange application programming interfaces (APIs) such as Java Database Connectivity (JDBC) and Open Database Connectivity ...

Get Microsoft Big Data Solutions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.