Chapter SixExtract and Load (EL) Data
So far, we've been outlining the practical aspects of what often is called Extract, Load, Transform (ELT) by modern data teams. The three letters refer to different stages of managing data, that is, the different steps for getting data from messy sources into polished information. In this stage, we focus on the first two letters. The third letter is covered in subsequent stages of the book.
For data to arrive in the data lake, data is (E)xtracted from its source, through SQL or an API, and then (L)oaded into the lake. This process is called extract and load—or “EL” for short. That data will also likely need further transformations that can make the data easier to query. There are many tools to help support this, but first, let's discuss an old debate: Should data be transformed before or after being loaded into a data lake?
ETL versus ELT
When in the extracting and loading process is it necessary to apply transformations to data? Well, there are two conventional paradigms for this (Figures 6.1 and 6.2).
- E T L is the legacy method where transformations of data happen on the way to the lake. It arose in ecosystems with end‐to‐end data products.
- E L T is the modern approach, where the transformation step happens after the data is loaded into the lake. The transformations occur when modeling the data in the data lake to make it into a data warehouse. It reflects a more modular approach to data analytics; different products representing different ...
Get The Informed Company now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.