15Process Data and Engineer Features in the Cloud

The world is one big data problem.

—Andrew McAfee

In the world of AI, the true essence lies not just in sophisticated algorithms, but in the quality and structure of data. As most AI practitioners would attest, raw data is rarely ready to be used as is. This is where cloud-based data processing comes in. It refers to the process of collecting, ingesting, storing, preprocessing, engineering, and processing data for use in machine learning models through cloud-based data technologies from AWS, Azure, and Google.

So why is data so vital? Simply put, the most sophisticated model is only as good as the data on which it is trained. Incorrectly processed data or poorly engineered data can lead to inaccurate predictions and decisions costly to your business.

As companies try to embrace the scalability and power of the cloud, learning how to process data in the cloud becomes paramount. Cloud provides several benefits, including scalability, performance, security, flexibility, and cost-effectiveness.

In this chapter, you learn about exploring your data needs and the benefits and challenges of cloud-based data processing, and you dive into hands-on exercises, including feature engineering and transformation techniques.

This chapter underscores the importance of data augmentation, showcases methods to handle missing data and inconsistencies, and presents the art and science of feature engineering, all happening in the cloud. See Figure ...

Get Enterprise AI in the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.