O'Reilly logo

OpenGL Insights by Christophe Riccio, Patrick Cozzi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hierarchical Depth Culling and
Bounding-Box Management
on the GPU
Dzmitry Malyshau
18.1 Introduction
Optimizing the data passed to the GPU is one of the keys to achieving high and stable
frame rates. The fewer data that go into a GPU, the b etter the performance. This
is what geometry-culling techniques are for: they reduce the number of fragments,
polygons, or even whole objects processed by the GPU.
There are several common culling methods today [Fernando 04]:
Frustum culling. On a high level, the graphics engine determines the ob-
jects outside the view frustum and leaves them out of drawing. It generally
uses a bounding-volume approximation (e.g., box, sphere, etc.) to compute
the intersection with the frustum. On a low level, the OpenGL rasterizer dis-
cards polygons and polygon parts outside of the clip space. This process is
performed after the vertex processing stage. Thus, some GPU time may be
wasted on the shading of vertices, which dont belong to visible triangles.
Backface culling. GPU-accelerated and exposed by OpenGL, this method
discards polygons facing away from the viewer. It may be implemented via one
scalar product per face, but it is fully optimized by the hardware.
Depth buffer. Exposed by OpenGL, this method stores the closest depth
value per fragment in order to discard fragments lying beyond that depth. Im-
mediate rendering implementation requires one read-modify-write operation
on the GPU per fragment. Efficiency may be improved by preordering opaque
objects, polygons, and fragments from the nearest to the farthest at drawing.
247
18
248 III Bending the Pipeline
Scene (initial
data set)
In-frustum
objects
Frustum culling
on CPU
In-frustum
fragments
Draw calls
sent to GPU
Vertex and geometry
shader processing
GL backface culling
and primitives clipping
Visible
fragments
Depth-buffer culling
(depth test)
Fragment shader
processing
Figure 18.1. Culling stages combined.
Most of the time, a developer will ap-
ply all three categories simultaneously (see
Figure 18.1).
Moving from stage to stage introduces
additional computational cost. Culling
unnecessary input as early as possible
yields the highest efficiency. This chap-
ter introduces one of the ways to use
the depth buffer for culling whole objects
while drawing. The method is known
as hierarchical depth culling (alternatively,
occlusion culling). It combines differ-
ent levels of the rendering pipeline for
the common goal of discarding invisible
primitives. These levels include: frame-
buffer for the depth buffer, spatial level
for bounding volumes, and the render-
ing sequence for the early depth pass.
The chapter presents a core OpenGL 3.0
implementation of the hierarchical depth
culling pipeline performed with minimal
CPU-GPU synchronization.
18.2 Pipeline
The pipeline (see Figure 18.2) can be expressed in the following short steps:
Obtain the depth buffer of occluders (may be the whole scene).
Construct depth mipmaps.
Update objects bounding boxes.
Perform depth culling of the bounding boxes.
Draw the scene using culling results.
This sequence does not mention DMA memory transfers, such as retaining
culling results in the system memory or debugging during the stage of drawing
bounding boxes; nor does it specify the exact order of commands, e.g., we may
use the depth buffer of the previous frame for culling. In the latter case, the culling
results would have a one frame delay and therefore would not be exact.
In the following sections, I will describe each stage in detail. The source GLSL
code of a working implementation can be found on the OpenGL Insights website,
www.openglinsights.com.
18. Hierarchical Depth Culling and Bounding-Box Management on the GPU 249
Scene
Early depth
pass
Depth
buffer
Depth
mipmap
constructor
Depth
mipmap
Opaque
geometry
Depth
fragments
Texture
levels
Mip
levels
Bounding-box
data
Modified
geometry
Bounding
box update
AABB
corners
Hier depth culling
(transform feedback)
Vertex
attributes
Texture
Visibility
data
Flags
Actual
rendering
Geometry, materials
and lights
Depth buffer
(read-only)
Render condition
Framebuffer
color output
Color
fragments
Legend:
Input/Result
Commonly-
used function
New
function
data
Figure 18.2. Hierarchical depth culling pipeline data flow.
18.2.1 Early Depth Pass
The early depth pass is a special rendering stage that draws an opaque part of a scene
into the depth buffer without any color output. The pixel processing cost for this
operation is minimal. It guarantees that only visible fragments will be processed by
heavy pixel shaders when the actual drawing of a scene is performed, with the depth
buffer attached in read-only mode. This pass utilizes the double-speed, depth-only
function implemented in the hardware for many cards [Cozzi 09].
The implementation assumes that we have a user-controlled FBO, where the
color attachment is supposed to store the rendered frame. The early depth pass is
computed this way:
Make sure the FBO has a texture as the depth attachment. We will need to
sample from it later. Thus, the depth-stencil format is not allowed.
Bind the FBO and set the draw buffer to GL NONE, meaning that no color
layers are affected.
Enable depth test and write. Clear the depth with “1.0.” Set the depth test
function to GL
LEQUAL.
Render opaque objects of the scene. The vertex shader does plain model-view-
projection transformation. No fragment shader is attached.
Note that no polygon offset is needed since we assume that the same geometry
and transformations in the same order will take place later in the frame. OpenGL

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required