Data Lakes

Book description


The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far. Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata – supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes. A variety of case studies are also presented, thus providing the reader with practical examples of data lake management.

Table of contents

  1. Cover
  2. Preface
  3. 1 Introduction to Data Lakes: Definitions and Discussions
    1. 1.1. Introduction to data lakes
    2. 1.2. Literature review and discussion
    3. 1.3. The data lake challenges
    4. 1.4. Data lakes versus decision-making systems
    5. 1.5. Urbanization for data lakes
    6. 1.6. Data lake functionalities
    7. 1.7. Summary and concluding remarks
  4. 2 Architecture of Data Lakes
    1. 2.1. Introduction
    2. 2.2. State of the art and practice
    3. 2.3. System architecture
    4. 2.4. Use case: the Constance system
    5. 2.5. Concluding remarks
  5. 3 Exploiting Software Product Lines and Formal Concept Analysis for the Design of Data Lake Architectures
    1. 3.1. Our expectations
    2. 3.2. Modeling data lake functionalities
    3. 3.3. Building the knowledge base of industrial data lakes
    4. 3.4. Our formalization approach
    5. 3.5. Applying our approach
    6. 3.6. Analysis of our first results
    7. 3.7. Concluding remarks
  6. 4 Metadata in Data Lake Ecosystems
    1. 4.1. Definitions and concepts
    2. 4.2. Classification of metadata by NISO
    3. 4.3. Other categories of metadata
    4. 4.4. Sources of metadata
    5. 4.5. Metadata classification
    6. 4.6. Why metadata are needed
    7. 4.7. Business value of metadata
    8. 4.8. Metadata architecture
    9. 4.9. Metadata management
    10. 4.10. Metadata and data lakes
    11. 4.11. Metadata management in data lakes
    12. 4.12. Metadata and master data management
    13. 4.13. Conclusion
  7. 5 A Use Case of Data Lake Metadata Management
    1. 5.1. Context
    2. 5.2. Related work
    3. 5.3. Metadata model
    4. 5.4. Metadata implementation
    5. 5.5. Concluding remarks
  8. 6 Master Data and Reference Data in Data Lake Ecosystems
    1. 6.1. Introduction to master data management
    2. 6.2. Deciding what to manage
    3. 6.3. Why should I manage master data?
    4. 6.4. What is master data management?
    5. 6.5. Master data and the data lake
    6. 6.6. Conclusion
  9. 7 Linked Data Principles for Data Lakes
    1. 7.1. Basic principles
    2. 7.2. Using Linked Data in data lakes
    3. 7.3. Limitations and issues
    4. 7.4. The smart cities use case
    5. 7.5. Take-home message
  10. 8 Fog Computing
    1. 8.1. Introduction
    2. 8.2. A little bit of context
    3. 8.3. Every machine talks
    4. 8.4. The volume paradox
    5. 8.5. The fog, a shift in paradigm
    6. 8.6. Constraint environment challenges
    7. 8.7. Calculations and local drift
    8. 8.8. Quality is everything
    9. 8.9. Fog computing versus cloud computing and edge computing
    10. 8.10. Concluding remarks: fog computing and data lake
  11. 9 The Gravity Principle in Data Lakes
    1. 9.1. Applying the notion of gravitation to information systems
    2. 9.2. Impact of gravitation on the architecture of data lakes
  12. Glossary
  13. References
  14. List of Authors
  15. Index
  16. End User License Agreement

Product information

  • Title: Data Lakes
  • Author(s): Anne Laurent, Dominique Laurent, Cédrine Madera
  • Release date: June 2020
  • Publisher(s): Wiley-ISTE
  • ISBN: 9781786305855