Chapter 11. The Future of Data Processing for Artificial Intelligence
As discussed in prior chapters, one major thread in designing machine learning (ML) pipelines and data processing systems more generally is pushing computation “down” whenever possible.
Data Warehouses Support More and More ML Primitives
Increasingly, databases provide the foundations to implement ML algorithms that run efficiently. As databases begin to support different data models and offer built-in operators and functions that are required for training and scoring, more machine learning computation can execute directly in the database.
Expressing More of ML Models in SQL, Pushing More Computation to the Database
As database management software grows more versatile and sophisticated, it has become feasible to push more and more ML computation to a database. As discussed in Chapter 7, modern databases already offer tools for efficiently storing and manipulating vectors. When data is stored in a database, there simply is no faster way of manipulating ML data than with single instruction, multiple data (SIMD) vector operations directly where the data resides. This eliminates data transfer and computation to change data types, and executes extremely efficiently using low-level vector operations.
When designing an analytics application, do not assume that all computation needs to happen directly in your application. Rather, think of your application as the top-level actor, delegating as much computation as possible ...
Get Data Warehousing in the Age of Artificial Intelligence now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.