O'Reilly logo

OpenGL Insights by Christophe Riccio, Patrick Cozzi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The OpenGL Timer Query
Christopher Lux
This chapter presents the OpenGL functionality to measure execution times of se-
quences of OpenGL commands using methods provided through the OpenGL timer
query. The special requirement of dedicated OpenGL timing methods for profiling
and runtime purposes is highlighted, followed by an introduction of the basic func-
tions and concepts regarding synchronous and asynchronous approaches to OpenGL
timing. Different types of applications are demonstrated, while indicating special
limitations to this functionality.
34.1 Introduction
How long does it take the graphics hardware (GPU) to execute a certain sequence of
OpenGL rendering commands? The answer to this question is essential during the
development as well as the runtime of real-time computer graphics applications such
as games, simulations, and scientific visualizations.
Profiling a program means measuring and recording, for instance, execution
times and memory usage of individual parts of the program. Profiling allows a soft-
ware engineer to analyze how many resources and ho w much time is spent in various
parts of the program, and thereby identify critical sections in the program source
code. These critical sections present the best opportunities for optimizations from
which the program performance can benefit the most.
Execution time measurements at program runtime can also be utilized to dynam-
ically adjust the workload of rendering algorithms to achieve or maintain interactive
493
34
494 VI Debugging and Profiling
frame times. For instance, a program can use information about rendering times
of geometrical models to a dapt the used levels-of-detail to decrease or increase the
geometry workload of the graphics system. Another application area of runtime tim-
ings is resource streaming, where, for instance, information about texture-upload
speedsisusedtoadjusttheamountoftexture resources transferred to GPU memory.
The timer query functionality, introduced with the EXT
timer query exten-
sion [ARB 06] and promoted to core specification with OpenGL Version 3.3 [Segal
and Akeley 11], allows us to measure the time it takes to execute a sequence of
OpenGL commands and to retrieve the current time stamp of the OpenGL server.
This time query mechanism is required because current GPUs are running asyn-
chronous to the CPU. Issuing an OpenGL command ultimately places it in the
OpenGL command-queue processed by the GPU at a later point in time, which
means, upon calling an OpenGL function, the corresponding command is not nec-
essarily executed directly, nor is it guaranteed to be finished when the call returns
control to the CPU. Furthermore, the execution of OpenGL commands stored in
the command queue is usually delayed by at least one frame in relation to the current
rendering frame on the CPU. This practice minimizes GPU idle times and hides
latencies in CPU-GPU interoperation.
Modern immediate-mode GPUs are employing heavily pipelined architectures,
which are processing different primitives (e.g., vertices, fragments) in different pipe-
line stages simultaneously [Ragan-Kelley 10]. This results in the overlapped execu-
tion of individually issued draw commands on the GPU. Figure 34.1 illustrates the
asynchronous and pipelined execution model of immediate-mode GPUs. This chap-
ter specifically discusses performance measurements for immediate-mode GPUs. The
architectural differences of tile-based GPUs and the associated differences in perfor-
mance profiling a re described in Chapter 23.
The basic asynchronous character of the OpenGL server allows the CPU to at-
tend to different tasks while the GPU is executing the issued commands. However ,
measuring GPU execution times using general CPU timing methods results in cap-
Rendering Frame N
DC_0
GPU
CPU
SwapBuffers
t
DC_1
DC_2DC_3
DC_0 (N x)
DC_1 (N x)
DC_2 (N x)
DC_3 (N x)
...
Figure 34.1. Asynchronous and pipelined execution of draw calls (DC n) on the GPU: four draw calls
areissuedbytheCPUduringtherenderingframeN. Through the heavily pipelined architecture of modern
immediate-mode GPUs, the draw commands are ultimately executed overlapped in parallel.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required