494 VI Debugging and Proﬁling
frame times. For instance, a program can use information about rendering times
of geometrical models to a dapt the used levels-of-detail to decrease or increase the
geometry workload of the graphics system. Another application area of runtime tim-
ings is resource streaming, where, for instance, information about texture-upload
speedsisusedtoadjusttheamountoftexture resources transferred to GPU memory.
The timer query functionality, introduced with the EXT
timer query exten-
sion [ARB 06] and promoted to core speciﬁcation with OpenGL Version 3.3 [Segal
and Akeley 11], allows us to measure the time it takes to execute a sequence of
OpenGL commands and to retrieve the current time stamp of the OpenGL server.
This time query mechanism is required because current GPUs are running asyn-
chronous to the CPU. Issuing an OpenGL command ultimately places it in the
OpenGL command-queue processed by the GPU at a later point in time, which
means, upon calling an OpenGL function, the corresponding command is not nec-
essarily executed directly, nor is it guaranteed to be ﬁnished when the call returns
control to the CPU. Furthermore, the execution of OpenGL commands stored in
the command queue is usually delayed by at least one frame in relation to the current
rendering frame on the CPU. This practice minimizes GPU idle times and hides
latencies in CPU-GPU interoperation.
Modern immediate-mode GPUs are employing heavily pipelined architectures,
which are processing different primitives (e.g., vertices, fragments) in different pipe-
line stages simultaneously [Ragan-Kelley 10]. This results in the overlapped execu-
tion of individually issued draw commands on the GPU. Figure 34.1 illustrates the
asynchronous and pipelined execution model of immediate-mode GPUs. This chap-
ter speciﬁcally discusses performance measurements for immediate-mode GPUs. The
architectural differences of tile-based GPUs and the associated differences in perfor-
mance proﬁling a re described in Chapter 23.
The basic asynchronous character of the OpenGL server allows the CPU to at-
tend to different tasks while the GPU is executing the issued commands. However ,
measuring GPU execution times using general CPU timing methods results in cap-
Rendering Frame N
DC_0 (N − x)
DC_1 (N − x)
DC_2 (N − x)
DC_3 (N − x)
Figure 34.1. Asynchronous and pipelined execution of draw calls (DC n) on the GPU: four draw calls
areissuedbytheCPUduringtherenderingframeN. Through the heavily pipelined architecture of modern
immediate-mode GPUs, the draw commands are ultimately executed overlapped in parallel.