March 2026
Intermediate
534 pages
12h 51m
English
Developing efficient CUDA applications involves much more than simply running code on a GPU. It requires identifying performance bottlenecks, analyzing kernel execution behavior, and optimizing data access patterns. This chapter particularly focuses on the tools and techniques for profiling and debugging CUDA programs to detect such bottlenecks. Once these are identified, we can apply targeted optimizations to improve performance.
We'll begin by discussing why profiling and debugging are important and why this is challenging on the GPU. Then, we'll demonstrate basic profiling tools available in Python and Linux environments. Next, we'll explore NVIDIA's dedicated tools, Nsight Systems and Nsight Compute, which ...
Read now
Unlock full access