Chapter 6. Streams and Events
CUDA is best known for enabling fine-grained concurrency, with hardware facilities that enable threads to closely collaborate within blocks using a combination of shared memory and thread synchronization. But it also has hardware and software facilities that enable more coarse-grained concurrency:
• CPU/GPU concurrency: Since they are separate devices, the CPU and GPU can operate independently of each other.
• Memcpy/kernel processing concurrency: For GPUs that have one or more copy engines, host↔device memcpy can be performed while the SMs are processing kernels.
• Kernel concurrency: SM 2.x-class and later hardware can run up to 4 kernels in parallel.
• Multi-GPU concurrency: For problems with enough computational ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access