November 2018
Intermediate to advanced
310 pages
7h 54m
English
We will now use Nsight to step through some code to help us better understand some of the CUDA GPU architecture, and how branching within a kernel is handled. This will give us some insight about how to write more efficient CUDA kernels. By branching, we mean how the GPU handles control flow statements such as if, else, or switch within a CUDA kernel. In particular, we are interested in how branch divergence is handled within a kernel, which is what happens when one thread in a kernel satisfies the conditions to be an if statement, while another doesn't and is an else statement: they are divergent because they are executing different pieces of code.
Let's write a small CUDA-C ...
Read now
Unlock full access