7

Cleaning and Processing Data

Some automated tasks will require dealing with large amounts of data. As data grows, two new and distinct problems appear. Processing the task takes too long and input data quality issues cause more problems.

Both problems are well known in the realm of data science dealing with big quantities of data, but the problems can appear even at a smaller scale.

The quality of input data is highly related to the number of sources of the data. In general, data from a single source will be more consistent, but using a single source is limiting. Even if the data comes from the same source, it could still contain inconsistencies or errors.

Some examples of differences could be regional, such as date formats or currencies, ...

Get Python Automation Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.