Chapter 9. Serving Data for Analytics, Machine Learning, and Reverse ETL
Congratulations! You’ve reached the final stage of the data engineering lifecycle—serving data for downstream use cases (see Figure 9-1). In this chapter, you’ll learn about various ways to serve data for three major use cases you’ll encounter as a data engineer. First, you’ll serve data for analytics and BI. You’ll prepare data for use in statistical analysis, reporting, and dashboards. This is the most traditional area of data serving. Arguably, it predates IT and databases, but it is as important as ever for stakeholders to have visibility into the business, organizational, and financial processes.
Figure 9-1. Serving delivers data for use cases
Second, you’ll serve data for ML applications. ML is not possible without high-quality data, appropriately prepared. Data engineers work with data scientists and ML engineers to acquire, transform, and deliver the data necessary for model training.
Third, you’ll serve data through reverse ETL. Reverse ETL is the process of sending data back to data sources. For example, we might acquire data from an ad tech platform, run a statistical process on this data to determine cost-per-click bids, and then feed this data back into the ad tech platform. Reverse ETL is highly entangled with BI and ML.
Before we get into these three major ways of serving data, let’s look at ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access