Data Warehousing in the Age of Artificial Intelligence
by Gary Orenstein, Conor Doherty, Mike Boyarski, Eric Boutin
Chapter 11. The Future of Data Processing for Artificial Intelligence
As discussed in prior chapters, one major thread in designing machine learning (ML) pipelines and data processing systems more generally is pushing computation “down” whenever possible.
Data Warehouses Support More and More ML Primitives
Increasingly, databases provide the foundations to implement ML algorithms that run efficiently. As databases begin to support different data models and offer built-in operators and functions that are required for training and scoring, more machine learning computation can execute directly in the database.
Expressing More of ML Models in SQL, Pushing More Computation to the Database
As database management software grows more versatile and sophisticated, it has become feasible to push more and more ML computation to a database. As discussed in Chapter 7, modern databases already offer tools for efficiently storing and manipulating vectors. When data is stored in a database, there simply is no faster way of manipulating ML data than with single instruction, multiple data (SIMD) vector operations directly where the data resides. This eliminates data transfer and computation to change data types, and executes extremely efficiently using low-level vector operations.
When designing an analytics application, do not assume that all computation needs to happen directly in your application. Rather, think of your application as the top-level actor, delegating as much computation as possible ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access