Chapter 2. How Data Lakes Work

Many IT organizations are simply overwhelmed by the sheer volume of data sets—small, medium, and large—that are stored in Hadoop, which although related, are not integrated. However, when done right, with an integrated data management framework, data lakes allow organizations to gain insights and discover relationships between data sets.

Data lakes created with an integrated data management framework eliminate the costly and cumbersome data preparation process of ETL that traditional EDW requires. Data is smoothly ingested into the data lake, where it is managed using metadata tags that help locate and connect the information when business users need it. This approach frees analysts for the important task of finding value in the data without involving IT in every step of the process, thus conserving IT resources. Today, all IT departments are being mandated to do more with less. In such environments, well-governed and managed data lakes help organizations more effectively leverage all their data to derive business insight and make good decisions.

Zaloni has created a data lake reference architecture that incorporates best practices for data lake building and operation under a data governance framework, as shown in Figure 2-1.

Zaloni’s data lake architecture
Figure 2-1. Zaloni’s data lake architecture

The main advantage of this architecture is that data can come into the data lake ...

Get Architecting Data Lakes now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.