Chapter 6. Streams and Events

CUDA is best known for enabling fine-grained concurrency, with hardware facilities that enable threads to closely collaborate within blocks using a combination of shared memory and thread synchronization. But it also has hardware and software facilities that enable more coarse-grained concurrency:

CPU/GPU concurrency: Since they are separate devices, the CPU and GPU can operate independently of each other.

Memcpy/kernel processing concurrency: For GPUs that have one or more copy engines, host↔device memcpy can be performed while the SMs are processing kernels.

Kernel concurrency: SM 2.x-class and later hardware can run up to 4 kernels in parallel.

Multi-GPU concurrency: For problems with enough computational ...

Get The CUDA Handbook: A Comprehensive Guide to GPU Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.