i
i
i
i
i
i
i
i
Chapter 15
Pipeline Optimization
“We should forget about small efficiencies, say about 97% of the
time: Premature optimization is the root of all evil.”
—Donald Knuth
As we saw in Chapter 2, the process of rendering an image is based on a
pipelined architecture with three conceptual stages: application, geometry,
and rasterizer. At any given moment, one of these stages, or the commu-
nication path between them, will always be the bottleneck—the slowest
stage in the pipeline. This implies that the bottleneck stage sets the limit
for the throughput, i.e., the total rendering performance, and so is a prime
candidate for optimization.
Optimizing the performance of the rendering pipeline resembles the
process of optimizing a pipelined processor (CPU) [541] in that it consists
mainly of two steps. First, the bottleneck of the pipeline is located. Second,
that stage is optimized in some way; and after that, step one is repeated if
the performance goals have not been met. Note that the bottleneck may
or may not be located at the same place after the optimization step. It is
a good idea to put only enough effort into optimizing the bottleneck stage
so that the bottleneck moves to another stage. Several other stages may
have to be optimized before this stage becomes the bottleneck again. For
this reason, effort should not be wasted on over-optimizing a stage.
The location of the bottleneck may change within a frame. At one mo-
ment the geometry stage may be the bottleneck because many tiny triangles
are rendered. Later in the frame the rasterizer could be the bottleneck be-
cause triangles covering large parts of the screen are rendered. So, when
we talk about, say, the rasterizer stage being the bottleneck, we mean it is
the bottleneck most of the time during that frame.
Another way to capitalize on the pipelined construction is to recognize
that when the slowest stage cannot be optimized further, the other stages
can be made to work just as much as the slowest stage. This will not change
performance, since the speed of the slowest stage will not be altered, but the
697
i
i
i
i
i
i
i
i
698 15. Pipeline Optimization
extra processing can be used to improve image quality. For example, say
that the bottleneck is in the application stage, which takes 50 milliseconds
(ms) to produce a frame, while the others each take 25 ms. This means
that without changing the speed of the rendering pipeline (50 ms equals
20 frames per second), the geometry and the rasterizer stages could also
do their work in 50 ms. For example, we could use a more sophisticated
lighting model or increase the level of realism with shadows and reflections,
assuming that this does not increase the workload on the application stage.
Pipeline optimization is a process in which we first maximize the ren-
dering speed, then allow the stages that are not bottlenecks to consume as
much time as the bottleneck. That said, this idea does not apply for newer
architectures such as the Xbox 360, which automatically load-balance com-
putational resources (more on this in a moment).
This exception is an excellent example of a key principle. When reading
this chapter, the dictum
KNOW YOUR ARCHITECTURE
should always be in the back of your mind, since optimization techniques
vary greatly for different architectures. A related dictum is, simply, “Mea-
sure.”
15.1 Profiling Tools
There are a number of worthwhile tools available for profiling use of the
graphics accelerator and CPU. Such tools are useful both for locating bot-
tlenecks and for optimizing. Examples include PIX for Windows (for Di-
rectX), gDEBugger (for OpenGL), NVIDIA’s NVPerfKit suite of tools,
ATIs GPU PerfStudio [1401], and Apple’s OpenGL Profiler.
As an example, PIX for Windows provides real-time performance eval-
uation by providing counters for a wide variety of data, such as the number
of draw calls, state changes, texture and shader calls, CPU and GPU idle
time, locks on various resources, read and write I/O, the amount of mem-
ory used, etc. This data can be displayed overlaid on the application itself.
Figure 15.1 was rendered with this technique.
PIX can capture all the DirectX calls made within a given frame for
later analysis or playback. Examining this stream can show whether and
where unnecessary API calls are being made. PIX can also be used for
pixel debugging, showing the frame buffer history for a single pixel.
While these tools can provide developers with most of the information
they need, sometimes other data is needed that does not fit the mold.
Pelzer [1001] presents a number of useful techniques to display debugging
information.

Get Real-Time Rendering, Third Edition, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.