Chapter 4. Data Transformation Strategies
A recent report published by Forbes describes how some stockbrokers and trading firms were able to access and analyze data faster than their competitors. This allowed them to “execute trades at the best price, microseconds ahead of the crowd. The win was ever so slight in terms of time, but massive in terms of the competitive advantage gained by speed to insight.”
When considering an analytics solution, speed to insights is important and the quicker an organization can respond to a shift in their data, the more competitive they will be. In many cases, to get the insights you need, the data needs to be transformed. As briefly discussed in Chapter 3, “Setting Up Your Data Models and Ingesting Data”, you can use an ETL approach, which reads the source data, processes the transformations in an external application, and loads the results, or you can use an ELT approach, which uses the data you just loaded and transforms the data in-place using the power of the Amazon Redshift compute.
In this chapter, we’ll start by “Comparing ELT and ETL Strategies” to help you decide which data loading strategy to use when building your data warehouse. We’ll also dive deep into some of the unique features of Redshift that were built for analytics use cases and that empower “In-Database Transformation” as well as how you can leverage in-built “Scheduling and Orchestration” capabilities to run your pipelines. Then we’ll cover how Amazon Redshift takes the ELT ...