book

Learning and Operating Presto

by Angelica Lo Duca, Tim Meehan, Vivek Bharathan, Ying Su

September 2023

Intermediate to advanced

191 pages

4h 32m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Why We Wrote This BookWho This Book Is ForConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgmentsAngelica Lo DucaTim MeehanVivek BharathanYing Su
Data Warehouses and Data LakesThe Role of Presto in a Data LakePresto Origins and Design ConsiderationsHigh PerformanceHigh ScalabilityCompliance with the ANSI SQL StandardFederation of Data SourcesRunning in the CloudPresto Architecture and Core ComponentsAlternatives to PrestoApache ImpalaApache HiveSpark SQLTrinoPresto Use CasesReporting and DashboardingAd Hoc QueryingETL Using SQLData LakehouseReal-Time Analytics with Real-Time DatabasesIntroducing Our Case StudyConclusion
Presto Manual InstallationRunning Presto on DockerInstalling DockerPresto Docker ImageBuilding and Running Presto on DockerThe Presto SandboxDeploying Presto on KubernetesIntroducing KubernetesConfiguring Presto on KubernetesAdding a New CatalogRunning the Deployment on KubernetesQuerying Your Presto InstanceListing CatalogsListing SchemasListing TablesQuerying a TableConclusion
Service Provider InterfaceConnector ArchitecturePopular ConnectorsThriftWriting a Custom ConnectorPrerequisitesPlugin and ModuleConfigurationMetadataInput/OutputDeploying Your ConnectorApache PinotSetting Up and Configuring PrestoPresto-Pinot Querying in ActionConclusion
Setting Up the EnvironmentPresto ClientDocker ImageKubernetes NodeConnectivity to PrestoREST APIPythonRJDBCNode.jsODBCOther Presto Client LibrariesBuilding a Client Dashboard in PythonSetting Up the ClientBuilding the DashboardConclusion
The Emergence of the LakehouseData Lakehouse ArchitectureData LakeFile StoreFile FormatTable FormatQuery EngineMetadata ManagementData GovernanceData Access ControlBuilding a Data LakehouseConfiguring MinIOConfiguring HMSConfiguring SparkRegistering Hudi Tables with HMSConnecting and Querying PrestoConclusion
Introducing Presto AdministrationConfigurationPropertiesSessionsJVMMonitoringConsoleREST APIMetricsManagementResource GroupsVerifiersSession Properties ManagersNamespace FunctionsConclusion
Introducing Presto SecurityBuilding Secure Communication in PrestoEncryptionKeystore ManagementConfiguring HTTPS/TLSAuthenticationFile-Based AuthenticationLDAPKerberosCreating a Custom AuthenticatorAuthorizationAuthorizing Access to the Presto REST APIConfiguring System Access ControlAuthorization Through Apache RangerConclusion
Introducing Performance TuningReasons for Performance TuningThe Performance Tuning Life CycleQuery Execution ModelApproaches for Performance Tuning in PrestoResource AllocationStorageQuery OptimizationAria ScanTable ScanningRepartitioningImplementing Performance TuningBuilding and Importing the Sample CSV Table in MinIOConverting the CSV Table in ORCDefining the Tuning ParametersRunning TestsConclusion
Introducing ScalabilityReasons to Scale PrestoCommon IssuesDesign ConsiderationsAvailabilityManageabilityPerformanceProtectionConfigurationHow to Scale PrestoMultiple CoordinatorsPresto on SparkSpillingUsing a Cloud ServiceConclusion

Content preview from Learning and Operating Presto

Chapter 1. Introduction to Presto

Over the last few years, the increasing availability of different data produced by users and machines has raised new challenges for organizations wanting to make sense of their data to make better decisions. Becoming a data-driven organization is crucial in finding insights, driving change, and paving the way to new opportunities. While it requires significant data, the benefits are worth the effort.

This large amount of data is available in different formats, provided by different data sources, and searchable with different query languages. In addition, when searching for valuable insights, users need results very quickly, thus requiring high-performance query engine systems. These challenges caused companies such as Facebook (now Meta), Airbnb, Uber, and Netflix to rethink how they manage data. They have progressively moved from the old paradigm based on data warehouses to data lakehouses. While a data warehouse manages structured and historical data, a data lakehouse can also manage and get insights from unstructured and real-time data.

Presto is a possible solution to the previous challenges. Presto is a distributed SQL query engine, created and used by Facebook at scale. You can easily integrate Presto in your data lake to build fast-running SQL queries that interact with data wherever your data is physically located, regardless of its original format.

This chapter will introduce you to the concept of the data lake and how it differs from ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781098141844Errata Page Supplemental Content

Learning and Operating Presto

by Angelica Lo Duca, Tim Meehan, Vivek Bharathan, Ying Su

Chapter 1. Introduction to Presto

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Ten Things to Know About ModelOps

What Employees Want Most in Uncertain Times

Data Superstream: Data Lakes and Warehouses

Reinventing the Organization for GenAI and LLMs

Publisher Resources

Chapter 1. Introduction to Presto

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Ten Things to Know About ModelOps

What Employees Want Most in Uncertain Times

Data Superstream: Data Lakes and Warehouses

Reinventing the Organization for GenAI and LLMs

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.