Chapter 8. Optimizing Query Response

Fast Query Response Explained

When processing large amounts of data in a distributed environment, a naive query plan might take orders of magnitude more time than the optimal plan. In some cases, the query execution will not complete, even after several hours, as shown in our experimental study.1 Pivotal’s Query Optimizer (PQO) is designed to find the optimal way to execute user queries in distributed environments such as Pivotal’s Greenplum Database and HAWQ. The open source version of PQO is called GPORCA. To generate the fastest plan, GPORCA considers thousands of alternative query execution plans and makes a cost-based decision.

As with most commercial and scientific database systems, user queries are submitted to the database engine via SQL. SQL is a declarative language that is used to define, manage and query the data that is stored in relational/stream data management systems.

Declarative languages describe the desired result, not the logic required to produce it. The responsibility for generating an optimal execution plan lies solely with the query optimizer employed in the database management system. To understand how query processing works in Greenplum, there is an excellent description in the documentation.

GPORCA is a top-down query optimizer based on the Cascades optimization framework,2 which is not tightly coupled with the host system. This unique feature enables GPORCA to run as a standalone service outside ...

Get Data Warehousing with Greenplum now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.