Real-Time Rendering, Third Edition, 3rd Edition

Chapter 15

Pipeline Optimization

“We should forget about small eﬃciencies, say about 97% of the

time: Premature optimization is the root of all evil.”

—Donald Knuth

As we saw in Chapter 2, the process of rendering an image is based on a

pipelined architecture with three conceptual stages: application, geometry,

and rasterizer. At any given moment, one of these stages, or the commu-

nication path between them, will always be the bottleneck—the slowest

stage in the pipeline. This implies that the bottleneck stage sets the limit

for the throughput, i.e., the total rendering performance, and so is a prime

candidate for optimization.

Optimizing the performance of the rendering pipeline resembles the

process of optimizing a pipelined processor (CPU) [541] in that it consists

mainly of two steps. First, the bottleneck of the pipeline is located. Second,

that stage is optimized in some way; and after that, step one is repeated if

the performance goals have not been met. Note that the bottleneck may

or may not be located at the same place after the optimization step. It is

a good idea to put only enough eﬀort into optimizing the bottleneck stage

so that the bottleneck moves to another stage. Several other stages may

have to be optimized before this stage becomes the bottleneck again. For

this reason, eﬀort should not be wasted on over-optimizing a stage.

The location of the bottleneck may change within a frame. At one mo-

ment the geometry stage may be the bottleneck because many tiny triangles

are rendered. Later in the frame the rasterizer could be the bottleneck be-

cause triangles covering large parts of the screen are rendered. So, when

we talk about, say, the rasterizer stage being the bottleneck, we mean it is

the bottleneck most of the time during that frame.

Another way to capitalize on the pipelined construction is to recognize

that when the slowest stage cannot be optimized further, the other stages

can be made to work just as much as the slowest stage. This will not change

performance, since the speed of the slowest stage will not be altered, but the

697

698 15. Pipeline Optimization

extra processing can be used to improve image quality. For example, say

that the bottleneck is in the application stage, which takes 50 milliseconds

(ms) to produce a frame, while the others each take 25 ms. This means

that without changing the speed of the rendering pipeline (50 ms equals

20 frames per second), the geometry and the rasterizer stages could also

do their work in 50 ms. For example, we could use a more sophisticated

lighting model or increase the level of realism with shadows and reﬂections,

assuming that this does not increase the workload on the application stage.

Pipeline optimization is a process in which we ﬁrst maximize the ren-

dering speed, then allow the stages that are not bottlenecks to consume as

much time as the bottleneck. That said, this idea does not apply for newer

architectures such as the Xbox 360, which automatically load-balance com-

putational resources (more on this in a moment).

This exception is an excellent example of a key principle. When reading

this chapter, the dictum

KNOW YOUR ARCHITECTURE

should always be in the back of your mind, since optimization techniques

vary greatly for diﬀerent architectures. A related dictum is, simply, “Mea-

sure.”

15.1 Proﬁling Tools

There are a number of worthwhile tools available for proﬁling use of the

graphics accelerator and CPU. Such tools are useful both for locating bot-

tlenecks and for optimizing. Examples include PIX for Windows (for Di-

rectX), gDEBugger (for OpenGL), NVIDIA’s NVPerfKit suite of tools,

ATI’s GPU PerfStudio [1401], and Apple’s OpenGL Proﬁler.

As an example, PIX for Windows provides real-time performance eval-

uation by providing counters for a wide variety of data, such as the number

of draw calls, state changes, texture and shader calls, CPU and GPU idle

time, locks on various resources, read and write I/O, the amount of mem-

ory used, etc. This data can be displayed overlaid on the application itself.

Figure 15.1 was rendered with this technique.

PIX can capture all the DirectX calls made within a given frame for

later analysis or playback. Examining this stream can show whether and

where unnecessary API calls are being made. PIX can also be used for

pixel debugging, showing the frame buﬀer history for a single pixel.

While these tools can provide developers with most of the information

they need, sometimes other data is needed that does not ﬁt the mold.

Pelzer [1001] presents a number of useful techniques to display debugging

information.

Get Real-Time Rendering, Third Edition, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Real-Time Rendering, Third Edition, 3rd Edition by Tomas Akenine-Möller, Eric Haines, Naty Hoffman

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly