5

Pragmatic Data Processing and Analysis

Data needs to be analyzed, transformed, and processed first before using it when training machine learning (ML) models. In the past, data scientists and ML practitioners had to write custom code from scratch using a variety of libraries, frameworks, and tools (such as pandas and PySpark) to perform the needed analysis and processing work. The custom code prepared by these professionals often needed tweaking since different variations of the steps programmed in the data processing scripts had to be tested on the data before being used for model training. This takes up a significant portion of an ML practitioner’s time, and since this is a manual process, it is usually error-prone as well.

One of the more ...

Get Machine Learning Engineering on AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.