O'Reilly logo

HDInsight Essentials - Second Edition by Rajesh Nadipalli

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Tools for transforming data in Data Lake

In this section, we will review the various tools that enable the transformation of data in the Data Lake. We will review HCatalog, Hive, and Pig in detail, which are the popular methods to transform data in Data Lake. Next, we will look at how Azure PowerShell enables easy assembly of these scripts into a single procedure.

HCatalog

Apache HCatalog manages metadata of the structure of files in Hadoop. In Chapter 5, Ingest and Organize Data Lake, we registered stage tables with HCatalog, and in this chapter, we will leverage that information for transformation.

Persisting HCatalog metastore in a SQL database

With Azure HDInsight, the metastore can be hosted in an embedded mode in Apache Derby, which comes

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required