Transformation for the OTP project

Let's take a look at a practical example of a transformation using our Airline Ontime Performance (OTP) project. Let's say that our transformation task is to get from a source stage table that we created in Chapter 5, Ingest and Organize Data Lake, (which is on the left-hand side) to aggregated (which is on the right-hand side) summary data by airline carrier, year, and month.

Transformation for the OTP project

To achieve the preceding transformation, we need to perform the following key steps:

  1. Clean the header line in each file that has the field names.
  2. Update the flight month from the current "MM" to "YYYYMM" format.
  3. Create an intermediate table with ...

Get HDInsight Essentials - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.