Foreword
Today, we are rapidly moving from the information age to the age of intelligence. Artificial intelligence (AI) is quickly transforming our day-to-day lives. This age is powered by data. Any business that wants to thrive in this age has no choice but to embrace data. It has no choice but to develop the ability and agility to harness data for a wide variety of uses. This need has led to the emergence of data lakes.
A data lake is generally created without a specific purpose in mind. It includes all source data, unstructured and semi-structured, from a wide variety of data sources, which makes it much more flexible in its potential use cases. Data lakes are usually built on low-cost commodity hardware, which makes it economically viable to store terabytes or even petabytes of data.
In my opinion, the true potential of data lakes can be harnessed only through the cloud—this is why we founded Qubole in 2011. This opinion is finally being widely shared around the globe. Today, we are seeing businesses choose the cloud as the preferred home for their data lakes.
Although most initial data lakes were created on-premises, movement to the cloud is accelerating. In fact, the cloud market for data lakes is growing two to three times faster than the on-premises data lake market. According to a 2018 survey by Qubole and Dimensional Research, 73% of businesses are now performing their big data processing in the cloud, up from 58% in 2017. The shift toward the cloud is needed in part ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access