October 2023
Intermediate to advanced
636 pages
17h 2m
English
The original definition of a data lake, which first appeared in a blog post by James Dixon in 2010 (see https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/), was as follows:
If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.
In his vision of what a data lake would be, Dixon imagined that a data lake would be fed by a single source of data, containing the raw data from a system (so not pre-aggregated like you ...