CHAPTER 6Processing Data
In this chapter we are going to dig deep into how we ingest, process, and enrich data, preparing it for analysis.
Specifically, we will be looking at serverless and traditional data engineering technologies and practices to perform extract, transform, and load (ETL) to turn raw data that is unusable into clean data stored in optimized formats for whatever analytics purpose you have in mind.
By and large, the main tool for performing all sorts of data processing tasks is AWS Glue, since Glue is more of a family of tools than a single one. AWS Glue enables you to connect to source master systems, using them for extraction or querying. It enables you to build a data catalog, allowing you to describe what data you have, ...
Get Data Analytics in the AWS Cloud now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.