Skip to Content
Data Lakes
book

Data Lakes

by Anne Laurent, Dominique Laurent, Cédrine Madera
June 2020
Beginner to intermediate
244 pages
5h 40m
English
Wiley-ISTE
Content preview from Data Lakes

1Introduction to Data Lakes: Definitions and Discussions

As stated by Power [POW 08, POW 14], a new component of information systems is emerging when considering data-driven decision support systems. This is the case because enhancing the value of data requires that information systems contain a new data-driven component, instead of an information-driven component1. This new component is precisely what is called data lake.

In this chapter, we first briefly review existing work on data lakes and then introduce a global architecture for information systems in which data lakes appear as a new additional component, when compared to existing systems.

1.1. Introduction to data lakes

The interest in the emerging concept of data lake is increasing, as shown in Figure 1.1, which depicts the number of times the expression “data lake” has been searched for during the last five years on Google. One of the earliest research works on the topic of data lakes was published in 2015 by Fang [FAN 15].

The term data lake was first introduced in 2010 by James Dixon, a Penthao CTO, in a blog [DIX 10]. In this seminal work, Dixon expected that data lakes would be huge sets of row data, structured or not, which users could access for sampling, mining or analytical purposes.

images

Figure 1.1. Queries about “data lake” on Google

In 2014, Gartner [GAR 14] considered that the concept of data lake was nothing ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Architecting Data Lakes

Architecting Data Lakes

Ashish Thusoo, Ben Sharma
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King
Data Superstream: Data Lakes and Warehouses

Data Superstream: Data Lakes and Warehouses

Alistair Croll, Lena Hall, Vini Jaiswal, Einat Orr, Wannes Rosiers, Jessica Larson, Ryan Blue, Tejas Chopra
Data Lake Maturity Model

Data Lake Maturity Model

Scott Gidley, Andy Oram

Publisher Resources

ISBN: 9781786305855Purchase book