Skip to Content
The Enterprise Big Data Lake
book

The Enterprise Big Data Lake

by Alex Gorelik
March 2019
Beginner to intermediate
221 pages
6h 35m
English
O'Reilly Media, Inc.
Book available
Content preview from The Enterprise Big Data Lake

Chapter 5. From Data Ponds/Big Data Warehouses to Data Lakes

Although when they were introduced over three decades ago, data warehouses were envisioned as a means of providing historical storage for enterprise data that would make it available for all types of new analytics, most data warehouses ended up being repositories of production-quality data used for only the most critical analytics. The majority could not process the vast amount and wide variety of data they contained. Some particularly high-end systems like Teradata could provide admirable scalability, but at very high costs. A lot of time and effort was spent tuning the performance of the data warehousing systems. As a result, any change—whether a new query or a schema change—had to go through elaborate architectural review and a lengthy approval and testing process. The ETL jobs that loaded the data warehouse were just as carefully constructed and tuned, and any new data required changes to those jobs and a similarly elaborate review and testing procedure. This prevented ad hoc querying and discouraged schema changes, and meant that data warehouses lacked agility.

Data lakes attempt to fulfill the original promise of an enterprise data repository by introducing extreme scalability, agility, future-proofing, and end user self-service. In this chapter we will take a closer look at data ponds—data warehouses implemented using big data technology—and explain how these ponds (or the data lakes that encompass them) can provide ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Lake for Enterprises

Data Lake for Enterprises

Vivek Mishra, Tomcy John, Pankaj Misra
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King

Publisher Resources

ISBN: 9781491931547Errata Page