Chapter NineThe Power of Layers and Views

Your data in the data lake is stored as tables; we will transform them using views to first create staging schemas and then the warehouse, and we can then materialize those views to increase performance. First, let's clarify the difference between views tables and materialized views.

A table is physically stored and only updated when new data is loaded into the lake or if the schema is changed.

A view is a saved query that returns a table that can then be queried (Figure 9.1). The table it returns is not physically stored. The view will reflect any changes in the underlying data it is querying.

image

Figure 9.1 A view references a table’s data without changing how the table’s data is stored.

A materialized view is physically stored, and is updated whenever an upstream table or view is updated.

To build the staging schema and the data warehouse, default to using SQL views. Views work essentially the same as tables. Using views helps prevent storing additional data and incurring extra cost overhead, with few exceptions. There are usually no significant performance gains from creating new tables or materializing every view as part of your transformations to build a data warehouse; there are a few exceptions, which we will touch on later in this section. Lastly, views are non‐destructive to the original data, so there's no need to worry ...

Get The Informed Company now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.