Transformation for the OTP project
Let's take a look at a practical example of a transformation using our Airline Ontime Performance (OTP) project. Let's say that our transformation task is to get from a source stage table that we created in Chapter 5, Ingest and Organize Data Lake, (which is on the left-hand side) to aggregated (which is on the right-hand side) summary data by airline carrier, year, and month.
To achieve the preceding transformation, we need to perform the following key steps:
- Clean the header line in each file that has the field names.
- Update the flight month from the current "MM" to "YYYYMM" format.
- Create an intermediate table with ...
Get HDInsight Essentials - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.