9

Using Arrow with Machine Learning Workflows

We just covered how to use Arrow Database Connectivity (ADBC), which provides a highly efficient way to interact with a multitude of data sources. In this chapter, we’ll dip into a way to use that data: machine learning (ML). It’s not just a buzzword– ML is frequently utilized for pattern recognition, data-driven decision-making, and generative artificial intelligence (GenAI) systems. It might be a controversial opinion, but at its core, ML workflows are just a specialized form of a standard data pipeline. As a result, where there’s data processing, there’s the opportunity for Arrow to be extremely useful!

Whether you’re doing feature engineering, model training, preprocessing, or otherwise, many ...

Get In-Memory Analytics with Apache Arrow - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.