Skip to Content
Learning and Operating Presto
book

Learning and Operating Presto

by Angelica Lo Duca, Tim Meehan, Vivek Bharathan, Ying Su
September 2023
Intermediate to advanced
191 pages
4h 32m
English
O'Reilly Media, Inc.
Content preview from Learning and Operating Presto

Preface

Data warehousing began by pulling data from operational databases into systems that were more optimized for analytics. These systems were expensive appliances to operate, which meant people were highly judicious about what data was ingested into their data warehousing appliance for analytics.

Over the years, demand for more data has exploded, far outpacing Moore’s law and challenging legacy data warehousing appliances. While this trend is true for the industry at large, certain companies were earlier than others to encounter the scaling challenges this posed.

Facebook was among the earliest companies to attempt to solve this problem in 2012. At the time, Facebook was using Apache Hive to perform interactive analysis. As Facebook’s datasets grew, Hive was found not to be as interactive (read: too slow) as desired. This is largely because the foundation of Hive is MapReduce, which, at the time, required intermediate datasets to be persisted to disk. This required a lot of I/O to disk for transient, intermediate result sets. So Facebook developed Presto, a new distributed SQL query engine designed as an in-memory engine without the need to persist intermediate result sets for a single query. This approach led to a query engine that processed the same query orders of magnitude faster, with many queries completing with less-than-a-second latency. End users such as engineers, product managers, and data analysts found they could interactively query fractions of large datasets ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Learning Presto DB

Learning Presto DB

Matt Fuller
Rust in Motion

Rust in Motion

Carol Nichols, Jake Goulding
The Book of Dash

The Book of Dash

Adam Schroeder, Christian Mayer, Ann Marie Ward
Flow Architectures

Flow Architectures

James Urquhart

Publisher Resources

ISBN: 9781098141844Errata Page