Skip to Content
Learning and Operating Presto
book

Learning and Operating Presto

by Angelica Lo Duca, Tim Meehan, Vivek Bharathan, Ying Su
September 2023
Intermediate to advanced
191 pages
4h 32m
English
O'Reilly Media, Inc.
Content preview from Learning and Operating Presto

Chapter 5. Open Data Lakehouse Analytics

So far, you have learned how to connect Presto to a data lake using standard connectors such as MySQL and Pinot. In addition, you have learned how to write a custom connector using Presto’s Java classes and methods. Finally, you have connected a client to Presto to run generic or custom queries. Now it’s time to use Presto in an advanced, more realistic scenario that addresses the main challenges of big data management: table lookup, concurrent access to data, and access control.

In this chapter, we will give an overview of the data lakehouse and implement a practical scenario. The chapter is divided into two parts. In the first part, we introduce the architecture of a data lakehouse, focusing on its main components. In the second part of the chapter, you will implement a practical data lakehouse scenario using Presto and completely open components.

The Emergence of the Lakehouse

The first generation of data lakes, based primarily on the Hadoop Distributed File System (HDFS), demonstrated the promise of analytics at scale. As a result, many organizations formed data platform architectures consisting of data lakes and data warehouses, stitching pipelines and workflows between them. However, the resulting platform was very complex, with issues around reliability, data freshness, and cost.1

To overcome these issues, organizations tried to stretch both the data lake and the data warehouse in terms of the workloads they could support, but with ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Ten Things to Know About ModelOps

Ten Things to Know About ModelOps

Thomas Hill, Mark Palmer, Larry Derany
What Employees Want Most in Uncertain Times

What Employees Want Most in Uncertain Times

Kristine W. Powers, Jessica B.B. Diaz
Data Superstream: Data Lakes and Warehouses

Data Superstream: Data Lakes and Warehouses

Alistair Croll, Lena Hall, Vini Jaiswal, Einat Orr, Wannes Rosiers, Jessica Larson, Ryan Blue, Tejas Chopra

Publisher Resources

ISBN: 9781098141844Errata PageSupplemental Content