HCatalog

HCatalog provides the table and storage management layer for Hadoop. It brings various tools in the Hadoop ecosystem together. Using HCatalog interface, different tools like Hive, Pig, and MapReduce can read and write data on Hadoop. All of them can use the shared schema and datatypes provided by HCatalog. Having shared the mechanism of reading and writing makes it easy to consume the output of one tool in the other one.

So how does HCatalog come in section of Datasets? So far, we have seen the HDFS folder-based Datasets in which based on some success flag, we come to know that data is available. Using HCatalog-based Datasets, we can trigger Oozie jobs based on time when data in a given Hive partition becomes available for consumption. ...

Get Apache Oozie Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.