5

Data Cleansing and Transformation

The success of a data pipeline is measured by its ability to transform the input data into the required attributes of the output data. It’s the finesse of the transformation stage that separates a nice toy pipeline from a powerful and impactful enterprise pipeline. The accuracy and optimization of data transformations are manifested via the use of methodical approaches to construct each task performed.

In this chapter, we will explore various data transformation techniques in Python, and how these techniques can be used to massage data into the desired format. You will walk away from this chapter with a firm basis in the following areas of data manipulation:

  • Data cleansing and transformation
  • The importance ...

Get Building ETL Pipelines with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.