Chapter 1. Introduction to Data Lakes
Data-driven decision making is changing how we work and live. From data science, machine learning, and advanced analytics to real-time dashboards, people are demanding data to help make decisions. Companies like Google, Amazon, and Facebook are data-driven juggernauts that are taking over traditional businesses by leveraging data. Financial services organizations and insurance companies have always been data driven, with quants and automated trading leading the way. The Internet of Things (IoT) is changing manufacturing, transportation, agriculture, and healthcare.
From governments and corporations in every vertical, to nonprofits and educational institutions, data is being seen as a game changer. Artificial intelligence (AI) and machine learning (ML) are permeating all aspects of our lives. According to Forbes in 2018, we have generated 90% of the world’s data in the last two years, and according to the World Economic Forum, we expect to generate more than 463 exabytes (that’s 463,000,000,000,000,000,000 bytes) per day by 2025. The world is literally bingeing on data because of the potential it represents.
We even have a term for this binge: big data, defined by Doug Laney of Gartner in terms of the three quantitative Vs (volume, variety, and velocity), to which he later added two qualitative Vs (veracity and value). Volume refers to the increased amount of data typically in petabytes and often generated by IoT devices. Variety refers to ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access