O'Reilly logo

HDInsight Essentials - Second Edition by Rajesh Nadipalli

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Transform Data in the Data Lake

In the previous chapter, we ingested the source data into the Data Lake. To make sense of the vast amount of raw data, a transformation procedure is required to convert it into information that can further be used by decision makers. In this chapter, we will discuss how to transform data.

The topics covered in this chapter are as follows:

  • Transformation overview
  • Tools for transforming data in a Data Lake, such as HCatalog, Hive, Pig, and MapReduce
  • Transformation of the airline on-time performance (OTP) raw data into an aggregate
  • Review results of transformation

Transformation overview

Once you get data into the cluster, the next step in a typical project is to get data ready for future consumption. This typically ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required