Skip to Content
Learning and Operating Presto
book

Learning and Operating Presto

by Angelica Lo Duca, Tim Meehan, Vivek Bharathan, Ying Su
September 2023
Intermediate to advanced
191 pages
4h 32m
English
O'Reilly Media, Inc.
Content preview from Learning and Operating Presto

Preface

Data warehousing began by pulling data from operational databases into systems that were more optimized for analytics. These systems were expensive appliances to operate, which meant people were highly judicious about what data was ingested into their data warehousing appliance for analytics.

Over the years, demand for more data has exploded, far outpacing Moore’s law and challenging legacy data warehousing appliances. While this trend is true for the industry at large, certain companies were earlier than others to encounter the scaling challenges this posed.

Facebook was among the earliest companies to attempt to solve this problem in 2012. At the time, Facebook was using Apache Hive to perform interactive analysis. As Facebook’s datasets grew, Hive was found not to be as interactive (read: too slow) as desired. This is largely because the foundation of Hive is MapReduce, which, at the time, required intermediate datasets to be persisted to disk. This required a lot of I/O to disk for transient, intermediate result sets. So Facebook developed Presto, a new distributed SQL query engine designed as an in-memory engine without the need to persist intermediate result sets for a single query. This approach led to a query engine that processed the same query orders of magnitude faster, with many queries completing with less-than-a-second latency. End users such as engineers, product managers, and data analysts found they could interactively query fractions of large datasets ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Ten Things to Know About ModelOps

Ten Things to Know About ModelOps

Thomas Hill, Mark Palmer, Larry Derany
What Employees Want Most in Uncertain Times

What Employees Want Most in Uncertain Times

Kristine W. Powers, Jessica B.B. Diaz
Data Superstream: Data Lakes and Warehouses

Data Superstream: Data Lakes and Warehouses

Alistair Croll, Lena Hall, Vini Jaiswal, Einat Orr, Wannes Rosiers, Jessica Larson, Ryan Blue, Tejas Chopra

Publisher Resources

ISBN: 9781098141844Errata PageSupplemental Content