Learning and Operating Presto

Book description

The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this distributed SQL query engine can be challenging even for the most experienced engineers. This practical book shows you how to begin Presto operations at your organization to derive insights on datasets wherever they reside.

Authors Vivek Bharathan, David Simmen, and George Wang explain what Presto is, where it came from, and how it differs from other data warehousing solutions. You'll discover why Facebook, Uber, Twitter, and cloud providers including AWS, Google Cloud, and Alibaba use Presto and how you can quickly deploy Presto in production.

You'll learn about:

  • Presto security and administration
  • Syntax and connectors
  • Top 15 key configuration parameters
  • Clusters and tuning
  • Troubleshooting: logs, error messages, and more
  • Extending Presto for real-time business insight
  • Extending PrestoDB

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
    1. Why This Book Is Important
    2. Comment on the Authors
    3. Prospective Table of Contents
  2. 1. Introducing Presto
    1. Presto Origins
    2. What Is Presto?
      1. Decoupled Storage and Compute for the Data Analytics Stack
      2. Federation of Multiple Backend Data Sources
      3. How Presto works
      4. Presto query processing explained
    3. Presto Operations
    4. Presto at Scale
    5. Presto in the Cloud
    6. Presto as an Analytics Platform
      1. Ad hoc querying
      2. Reporting and dashboarding
      3. ETL using SQL
      4. Data lake analytics
      5. Real-time analytics with real-time databases
    7. Open Source Community
      1. Presto Foundation
    8. Conclusion
  3. 2. Operating Presto at Scale
    1. Common issues when running Presto at scale
      1. Presto Coordinator - Single Point of Failure (SPoF)
      2. Bad worker state
      3. Bad queries
    2. Large scale Presto infrastructure in production
      1. Using Presto Gateway to improve infrastructure availability
      2. Workload Manager - Presto Resource Groups for better resource utilization
      3. Query Event Listener framework for better observability
      4. Query protection layer to protect the infrastructure from bad actors
      5. Choosing node type and JVM settings in production
    3. Conclusion
    4. Acknowledgement
  4. 3. Real-time Analytics for Real-time Business Insights: Presto & Apache Pinot
    1. Introducing Apache Pinot
      1. A closer look at Pinot
      2. Why Use Pinot via Presto?
    2. Setting up Presto+Pinot
      1. Connecting a Pinot cluster to Presto
      2. Exposing Pinot tables as Presto tables
      3. How Presto queries Pinot
      4. Presto-Pinot querying in action
      5. Date/Time handling in Pinot vs. Presto
      6. Troubleshooting Common Issues
    3. Summary
  5. 4. Extending Presto: Building a Presto Connector
    1. Plugin and Module
      1. Example.Plugin.Java
      2. ExampleConnectorFactory.java
      3. ExampleModule.java
      4. ExampleConnector.java
      5. ExampleHandleResolver.java
    2. Configuration
      1. Connector Properties
      2. Session Properties
      3. Table Properties
    3. Metadata
      1. ExampleMetadata.java
      2. ExampleClient.java
    4. Input/Output
      1. Split
      2. Record Set
    5. Deploying your Connector

Product information

  • Title: Learning and Operating Presto
  • Author(s): Vivek Bharathan, David E. Simmen, George Wang
  • Release date: January 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492095118