CHAPTER 6Processing Data
In this chapter we are going to dig deep into how we ingest, process, and enrich data, preparing it for analysis.
Specifically, we will be looking at serverless and traditional data engineering technologies and practices to perform extract, transform, and load (ETL) to turn raw data that is unusable into clean data stored in optimized formats for whatever analytics purpose you have in mind.
By and large, the main tool for performing all sorts of data processing tasks is AWS Glue, since Glue is more of a family of tools than a single one. AWS Glue enables you to connect to source master systems, using them for extraction or querying. It enables you to build a data catalog, allowing you to describe what data you have, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access