The Modern Cloud Data Platform: Rise of the Lakehouse

Leading organizations understand the importance of making high-quality data accessible, usable, and trusted. A 2019 McKinsey survey found that the companies with the greatest growth in earnings over the previous three years attributed at least 20% of that growth directly to their data initiatives.

How did they achieve this? These high-performing companies deploy a three-pronged strategy, according to McKinsey. First, they articulate clear, long-term data strategies. Second, they nurture a data-driven culture by making data an integral part of employees’ jobs and educating them on proper data governance. And third, they deploy modern data platforms to support all their data activities at scale.

But what is a “modern data platform”? Is it a data warehouse? A data lake? Can all or part of it be on premises, or must it involve the cloud (or even multiple clouds)? What are the benefits and challenges of these various approaches? And if there were an ideal data platform architecture, what would it look like?

In 2020, O’Reilly Media, in collaboration with Databricks, performed a global survey of more than three thousand data professionals to determine the state of modern cloud data platform architectures. Respondents were asked to assess their current data platform architectures—especially the challenges they had with them—and how those challenges impact business and team success. They were also asked to recommend criteria that would ...

Get The Modern Cloud Data Platform now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.