Chapter 9. Serving Data for Analytics, Machine Learning, and Reverse ETL

Congratulations! You’ve reached the final stage of the data engineering lifecycle—serving data for downstream use cases (see Figure 9-1). In this chapter, you’ll learn about various ways to serve data for three major use cases you’ll encounter as a data engineer. First, you’ll serve data for analytics and BI. You’ll prepare data for use in statistical analysis, reporting, and dashboards. This is the most traditional area of data serving. Arguably, it predates IT and databases, but it is as important as ever for stakeholders to have visibility into the business, organizational, and financial processes.

Figure 9-1. Serving delivers data for use cases

Second, you’ll serve data for ML applications. ML is not possible without high-quality data, appropriately prepared. Data engineers work with data scientists and ML engineers to acquire, transform, and deliver the data necessary for model training.

Third, you’ll serve data through reverse ETL. Reverse ETL is the process of sending data back to data sources. For example, we might acquire data from an ad tech platform, run a statistical process on this data to determine cost-per-click bids, and then feed this data back into the ad tech platform. Reverse ETL is highly entangled with BI and ML.

Before we get into these three major ways of serving data, let’s look at ...

Get Fundamentals of Data Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.