3Exploiting Software Product Lines and Formal Concept Analysis for the Design of Data Lake Architectures
The main objective of this work is to investigate an approach to assisting the user in the design of a data lake architecture. This work can thus be seen as a step, in a more global strategy, towards the development of architectures of information systems, given a specific domain.
Software product line engineering is an approach that allows for the formalization of a series of similar software products or systems, which only differ in some of their optional components. When dealing with data lakes, this approach is independent from softwares, but takes into account the main components or features that have been identified in Chapter 1. Consequently, the obtained formalization allows for significant gains in terms of costs, processing time and quality.
3.1. Our expectations
It is important to recall that the concept of data lake originates from, on the one hand, the need to deal with massive volumes of data and, on the other hand, the use of Apache Hadoop technology. As seen in the previous two chapters, the association between data lake and Apache Hadoop technology is restrictive and did not really meet users expectations. This explains why data lake architecture has evolved towards hybrid architectures.
Considering that, in many cases, applications are not isolated systems independent from each other, but rather share needs, functionalities and properties, the main idea ...
Get Data Lakes now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.