Introduction
Whether your title is data engineer or another data-oriented profession (we see you, analysts and scientists), you’ve likely heard the term ETL. There’s a good chance ETL is a part of your life, even if you don’t know it!
Short for extract, transform, load, ETL is used to describe the foundational workflow most data practitioners are tasked with—taking data from a source system, changing it to suit their needs, and loading it to a target.
Want to help product leaders make data-driven decisions? ETL builds the critical tables for your reports. Want to train the next iteration of your team’s machine learning model? ETL creates quality datasets. Are you trying to bring more structure and rigor to your company’s storage policies to meet compliance requirements? ETL will bring process, lineage, and observability to your workflows.
If you want to do anything with data, you need a reliable process or pipeline. This fundamental truth holds true from classic business intelligence (BI) workloads to cutting-edge advancements, like large language models (LLMs) and AI.
The Brave New World of AI
The data world has seen many trends come and go; some have transformed the space, and some have turned out to be short-lived fads. The most recent is, without a doubt, generative AI (GenAI).
At every turn, there’s chatter about AI, LLMs, and agents. This recent fascination with AI, largely brought by the release of OpenAI’s ChatGPT, extends beyond the media’s interest and among researchers—it ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access