
150 III Rendering
memory, indicating that the shading term needs to be evaluated at the other pix-
els in that coarse pixel. All threads within the group wait at the synchronization
point. By structuring the code this way, we have moved the shading term eval-
uation (often expensive) out of the control flow and reduced the control flow
divergence.
For the regions where we could not lower the shading rate, we change the
axis of parallelism from one thread per coarse pixel to one thread per pixel and
evaluate the shading term. The threads that evaluated the shading term at the
coarse rate wait at the group synchronization point. If the DPI is high enough, ...