images

Expanding Your Capability with HBase and HCatalog

WHAT YOU WILL LEARN IN THIS CHAPTER:

Knowing When to Use HBase
Creating HBase Tables
Loading Data into an HBase Table
Performing a Fast Lookup with HBase
Defining Data Structures in HCatalog
Creating Indexes and Partitions on HCatalog Tables
Integrating HCatalog with Pig and Hive

This chapter looks at two tools that you can use to create structure on top of your big data stored in the Hadoop Distributed File System (HDFS): HBase and HCatalog. HBase is a tool that creates key/value tuples on top of the data and stores the key values in a columnar storage structure. HBase ensures fast lookups and enables consistency when updating the data. It supports huge update rates while providing almost instant access to the updated data. For example, you might use HBase to record and analyze streaming data from sensors providing near real-time agile predictive analytics.

The other tool, HCatalog, provides a relational table abstraction layer over HDFS. Using the HCatalog abstraction layer allows query tools such as Pig and Hive to treat the data in a familiar relational architecture. It also permits easier exchange of data between the HDFS storage and client tools used to present the data for analysis using familiar data exchange application programming interfaces (APIs) such as Java Database Connectivity (JDBC) and Open Database Connectivity ...

Get Microsoft Big Data Solutions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Microsoft Big Data Solutions by Adam Jorgensen, James Rowland-Jones, John Welch, Dan Clark, Christopher Price, Brian Mitchell

Expanding Your Capability with HBase and HCatalog

WHAT YOU WILL LEARN IN THIS CHAPTER:

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly