Using Nsight to understand the warp lockstep property in CUDA

We will now use Nsight to step through some code to help us better understand some of the CUDA GPU architecture, and how branching within a kernel is handled. This will give us some insight about how to write more efficient CUDA kernels. By branching, we mean how the GPU handles control flow statements such as if, else, or switch within a CUDA kernel. In particular, we are interested in how branch divergence is handled within a kernel, which is what happens when one thread in a kernel satisfies the conditions to be an if statement, while another doesn't and is an else statement: they are divergent because they are executing different pieces of code.

Let's write a small CUDA-C ...

Get Hands-On GPU Programming with Python and CUDA now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.