Skip to Main Content
Databases Illuminated, 4th Edition
book

Databases Illuminated, 4th Edition

by Catherine M. Ricardo, Susan D. Urban, Karen C. Davis
March 2022
Intermediate to advanced content levelIntermediate to advanced
682 pages
22h 58m
English
Jones & Bartlett Learning
Content preview from Databases Illuminated, 4th Edition

15.5 Data Lakes

A data lake is a repository of raw data that is taken directly from data sources in real time and stored in its original form for possible later use. The term was coined in 2010 by James Dixon, founder of the business analytics company Pentaho. He described his use of a lake as an analogy in his blog (jamesdixon.wordpress.com), saying, “If you think of a datamart as a store of bottled water—cleansed and packaged and structured for easy consumption—the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”

FIGURE 15.3 show the components of a data lake. Unlike a data warehouse, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

SQL and Relational Theory, 3rd Edition

SQL and Relational Theory, 3rd Edition

C.J. Date

Publisher Resources

ISBN: 9781284231595