Transformation for the OTP project

Let's take a look at a practical example of a transformation using our Airline Ontime Performance (OTP) project. Let's say that our transformation task is to get from a source stage table that we created in Chapter 5, Ingest and Organize Data Lake, (which is on the left-hand side) to aggregated (which is on the right-hand side) summary data by airline carrier, year, and month.

Transformation for the OTP project

To achieve the preceding transformation, we need to perform the following key steps:

  1. Clean the header line in each file that has the field names.
  2. Update the flight month from the current "MM" to "YYYYMM" format.
  3. Create an intermediate table with ...

Get HDInsight Essentials - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.