The Modern Cloud Data Platform: Rise of the Lakehouse
Leading organizations understand the importance of making high-quality data accessible, usable, and trusted. A 2019 McKinsey survey found that the companies with the greatest growth in earnings over the previous three years attributed at least 20% of that growth directly to their data initiatives.
How did they achieve this? These high-performing companies deploy a three-pronged strategy, according to McKinsey. First, they articulate clear, long-term data strategies. Second, they nurture a data-driven culture by making data an integral part of employees’ jobs and educating them on proper data governance. And third, they deploy modern data platforms to support all their data activities at scale.
But what is a “modern data platform”? Is it a data warehouse? A data lake? Can all or part of it be on premises, or must it involve the cloud (or even multiple clouds)? What are the benefits and challenges of these various approaches? And if there were an ideal data platform architecture, what would it look like?
In 2020, O’Reilly Media, in collaboration with Databricks, performed a global survey of more than three thousand data professionals to determine the state of modern cloud data platform architectures. Respondents were asked to assess their current data platform architectures—especially the challenges they had with them—and how those challenges impact business and team success. They were also asked to recommend criteria that would ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access