book

Automating the Modern Data Warehouse

by Steve Swoyer

March 2021

Intermediate to advanced

64 pages

1h 40m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Migration, Modernization, and TransformationThis Text Is a Guidebook for the Perplexed
The Lives of AnalyticsAnalytics as a Site of Rapid and Ongoing TransformationDiagnosing the Present, Predicting the FutureTaking Action, Maximizing Outcomes Putting It All Together
Data Management ExplainedModern Data Management ExplainedThe Data Lake: A Complement to the Data WarehouseModern Data WarehouseModern Data Management and the Cloud
Automation and the Cloud Data WarehouseThe PaaS Data WarehouseUsing ML-Driven AI to Automate, Improve, and Secure the Cloud Data WarehouseML and AI Take to (and Take Off in) the CloudThe Multimodel Data Warehouse in the Cloud
The Many Modalities of CloudCloud Data Warehouse Use CasesThe Enterprise Data WarehouseML, Predictive Analytics, and Other Emergent Use CasesThe Data Mart or Departmental Data WarehouseThe Data LakeThe Cloud Data Management StackContextualizing Data in the Cloud Data VirtualizationData Catalogs Graph Databases
Cloud Data Warehouse TopologiesHybrid CloudMulticloudThe Local Cloud, the On-Premises Public-Private CloudSelecting a ProviderConfiguring and Managing the Data Warehouse in the CloudSecuring the Data Warehouse in the Cloud
Final Thoughts

Content preview from Automating the Modern Data Warehouse

Conclusion

The data warehouse in the cloud does not constitute a radical break with its on-premises predecessor. In the cloud, the role of the data warehouse is the same as it ever was: it remains the authoritative system of record, the ground and guarantor of the veracity of all of the data that is potential grist for business decision making. Even prior to the emergence of data lakes and data science, one critical role of the data warehouse was that of a research lab in which the business performed experiments on itself.

The data-warehouse-in-the-cloud, by contrast, is practically unlimited in terms of its size and prolificacy. It is multiparous—that is, capable of being recreated or replicated as often as needed—in much the same way that the on-premises data warehouse is not. Subscribers can draw upon the reserve capacity of the hyperscale cloud to create very large single-instance data warehouse configurations of dozens or, even, hundreds of terabytes. They can create, pause, resume, and/or destroy virtual data warehouse instances as needed; better still, instances can be created (or destroyed) in response to programmatic events, such as API calls, or triggered by rules engines. The data warehouse in the cloud is not perfect. As a general rule, on-premises data warehouse systems will require more compute, more memory, and more storage resources if they are to be successfully transplanted into the cloud context. How much more is a function of trial and error.

The cloud data ...