14

Best Practices for ETL Pipelines

Up to this point in the book, we’ve gone through various tools and methods to create reliable, scalable, and maintainable ETL pipelines. We’ve also spent time discussing the concept of “garbage in, garbage out,” where the data quality and integrity of both the source and expected output data need to be prioritized throughout pipeline design and implementation, or the pipeline fails to perform its purpose. However, we haven’t spent a significant amount of time discussing some of the most common pitfalls to be cognizant of while building these pipelines.

In this chapter, we will discuss the importance of monitoring and logging each activity process within every pipeline you build, and how error handling and ...

Get Building ETL Pipelines with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.