Chapter 1. Introduction to Presto
Over the last few years, the increasing availability of different data produced by users and machines has raised new challenges for organizations wanting to make sense of their data to make better decisions. Becoming a data-driven organization is crucial in finding insights, driving change, and paving the way to new opportunities. While it requires significant data, the benefits are worth the effort.
This large amount of data is available in different formats, provided by different data sources, and searchable with different query languages. In addition, when searching for valuable insights, users need results very quickly, thus requiring high-performance query engine systems. These challenges caused companies such as Facebook (now Meta), Airbnb, Uber, and Netflix to rethink how they manage data. They have progressively moved from the old paradigm based on data warehouses to data lakehouses. While a data warehouse manages structured and historical data, a data lakehouse can also manage and get insights from unstructured and real-time data.
Presto is a possible solution to the previous challenges. Presto is a distributed SQL query engine, created and used by Facebook at scale. You can easily integrate Presto in your data lake to build fast-running SQL queries that interact with data wherever your data is physically located, regardless of its original format.
This chapter will introduce you to the concept of the data lake and how it differs from ...