Chapter 9. Optimizing Query Response

When Greenplum forked from PostgreSQL, it inherited the query planner. However, because the planner had no understanding of the Greenplum architecture, it became unwieldy to use in an MPP environment. Pivotal made a strategic decision to build a query optimizer tuned for Greenplum.

Fast Query Response Explained

When processing large amounts of data in a distributed environment, a naive query plan might take orders of magnitude more time than the optimal plan. In some cases, the query execution will not complete, even after several hours, as shown in our experimental study.1 The Pivotal Query Optimizer (PQO) is designed to find the optimal way to execute user queries in distributed environments such as Pivotal’s Greenplum Database and HAWQ. The open source version of PQO is called GPORCA. To generate the fastest plan, GPORCA considers thousands of alternative query execution plans and makes a cost-based decision.

As with most commercial and scientific database systems, user queries are submitted to the database engine via SQL. SQL is a declarative language that is used to define, manage, and query the data stored in relational/stream data management systems.

Declarative languages describe the desired result, not the logic required to produce it. The responsibility for generating an optimal execution plan lies solely with the query optimizer employed in the database management system. To understand how query processing works ...

Get Data Warehousing with Greenplum, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.