Chapter 5: Architecting Data Engineering Pipelines

Having gained an understanding of data engineering principles, the core concepts, and the available AWS tools, we can now put these together in the form of a data pipeline. A data pipeline is the process that ingests data from multiple sources, optimizes and transforms the data, and makes it available to data consumers. An important function of the data engineering role is the ability to design, or architect, these pipelines.

In this chapter, we will cover the following topics:

  • Approaching the task of architecting a data pipeline
  • Identifying data consumers and understanding their requirements
  • Identifying data sources and ingesting data
  • Identifying data transformations and optimizations
  • Loading ...

Get Data Engineering with AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.