Chapter 10: Data Pipeline Management

Our data is composed of a lot of data types, such as IoT device logs, user logs, web server logs, and business reports. This data is generally stored in multiple data sources, such as relational databases, NoSQL databases, data warehouses, and data lakes, based on your applications, business needs, and rules. In this situation, there might be cases where you must obtain aggregated data results for user analysis, cost reports, and building machine learning models. To obtain the results, you may need to implement data processing flows to read data from multiple data sources by using a programming language, SQL, and so on. We usually call these flows data pipelines.

Recent pipeline flows consist of extracting ...

Get Serverless ETL and Analytics with AWS Glue now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Serverless ETL and Analytics with AWS Glue by Vishal Pathak, Subramanya Vajiraya, Noritaka Sekiyama, Tomohiro Tanaka, Albert Quiroga, Ishan Gaur

Chapter 10: Data Pipeline Management

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly