Chapter 5

Performance considerations

Abstract

In this chapter, we reviewed the major aspects of application performance on a CUDA device: global memory access coalescing, memory parallelism, control flow divergence, dynamic resource partitioning and instruction mixes. Each of these aspects is rooted in the hardware limitations of the devices. Based on these concepts, we introduce techniques for analyzing the code for memory coalescing, channel/bank utilization, and control divergence. More importantly, we introduce techniques for converting poor performing code into well performing code: corner-turning, active thread index consolidation, and thread granularity coarsening.

Keywords

Compute-bound; memory-bound; bottleneck; memory bandwidth; DRAM burst; ...

Get Programming Massively Parallel Processors, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.