The data warehouse in the cloud does not constitute a radical break with its on-premises predecessor. In the cloud, the role of the data warehouse is the same as it ever was: it remains the authoritative system of record, the ground and guarantor of the veracity of all of the data that is potential grist for business decision making. Even prior to the emergence of data lakes and data science, one critical role of the data warehouse was that of a research lab in which the business performed experiments on itself.

The data-warehouse-in-the-cloud, by contrast, is practically unlimited in terms of its size and prolificacy. It is multiparous—that is, capable of being recreated or replicated as often as needed—in much the same way that the on-premises data warehouse is not. Subscribers can draw upon the reserve capacity of the hyperscale cloud to create very large single-instance data warehouse configurations of dozens or, even, hundreds of terabytes. They can create, pause, resume, and/or destroy virtual data warehouse instances as needed; better still, instances can be created (or destroyed) in response to programmatic events, such as API calls, or triggered by rules engines. The data warehouse in the cloud is not perfect. As a general rule, on-premises data warehouse systems will require more compute, more memory, and more storage resources if they are to be successfully transplanted into the cloud context. How much more is a function of trial and error.

The cloud data ...

Get Automating the Modern Data Warehouse now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.