Chapter 61. Six Dimensions for Picking an Analytical Data Warehouse
Gleb Mezhanskiy
The data warehouse (DWH) plays a central role in the data ecosystem. It is also often the most expensive piece of data infrastructure to replace, so it’s important to choose the right solution and one that can work well for at least seven years. Since analytics is used to power important business decisions, picking the wrong DWH is a sure way to create a costly bottleneck for your business.
In this chapter, I propose six dimensions for evaluating a data-warehousing solution for the following use cases:
Ingesting and storing all analytical data
Executing data transformations (the T of ELT)
Serving data to consumers (powering dashboards and ad hoc analysis)
Scalability
Businesses that choose poorly scalable data warehouses pay an enormous tax on their productivity when their DWHs cannot grow anymore: queries get backlogged, users are blocked, and the company is forced to migrate to a better-scaling DWH. However, at the point you feel the pain, it’s already too late: the migrations are slow (years), painful, and almost never complete.
Scalability for data warehouses means three things:
You can increase storage easily, whenever needed, and at a constant (if not diminishing) unit price.
You can scale computing resources to have as many data-processing jobs as you need running concurrently without ...
Get 97 Things Every Data Engineer Should Know now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.