Skip to Content
Data Lakes
book

Data Lakes

by Anne Laurent, Dominique Laurent, Cédrine Madera
June 2020
Beginner to intermediate
244 pages
5h 40m
English
Wiley-ISTE
Content preview from Data Lakes

9The Gravity Principle in Data Lakes

We have seen in the previous chapter how the data lake concept can be complex from an architecture point of view and is not only a simple storage management system. The Apache Hadoop technology, which is the most used technology to store data for the data lake, is now not the only solution proposed. Several hybrid solutions such as NoSQL and RDBMS are now implemented. The data lake solutions are now more complex to design, from an architecture point of view, and really need to explore several technologies and approaches. In this chapter, we want to explore some factors which can force, from an architecture angle, alternative solutions to the “physical” data movement from data sources to data lakes. Based on some works in [ALR 15, MCC 14, MCC 10], an interesting perspective to explore for the data lake is the data gravity concept. In this chapter, we want to investigate what the data gravity influence could be on the data lake design architecture and which are the parameters into the data gravity concept could influence.

9.1. Applying the notion of gravitation to information systems

9.1.1. Universal gravitation

In physics, universal gravitation refers to the mutual attraction between any two bodies whose mass is not null. According to Newton, the force F between two point bodies of respective masses m1 and m2 and located at distance d is as follows:

where G is the universal gravitational constant. Gravitation is the cause of orbital motions ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Architecting Data Lakes

Architecting Data Lakes

Ashish Thusoo, Ben Sharma
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King
Data Superstream: Data Lakes and Warehouses

Data Superstream: Data Lakes and Warehouses

Alistair Croll, Lena Hall, Vini Jaiswal, Einat Orr, Wannes Rosiers, Jessica Larson, Ryan Blue, Tejas Chopra
Data Lake Maturity Model

Data Lake Maturity Model

Scott Gidley, Andy Oram

Publisher Resources

ISBN: 9781786305855Purchase book