Dynamic parallelism
First, we will take a look at dynamic parallelism, a feature in CUDA that allows a kernel to launch and manage other kernels without any interaction or input on behalf of the host. This also makes many of the host-side CUDA-C features that are normally available also available on the GPU, such as device memory allocation/deallocation, device-to-device memory copies, context-wide synchronizations, and streams.
Let's start with a very simple example. We will create a small kernel over N threads that will print a short message to the terminal from each thread, which will then recursively launch another kernel over N - 1 threads. This process will continue until N reaches 1. (Of course, beyond illustrating how dynamic parallelism ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access