O'Reilly logo

The CUDA Handbook: A Comprehensive Guide to GPU Programming by Nicholas Wilt

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Streams and Events

CUDA is best known for enabling fine-grained concurrency, with hardware facilities that enable threads to closely collaborate within blocks using a combination of shared memory and thread synchronization. But it also has hardware and software facilities that enable more coarse-grained concurrency:

CPU/GPU concurrency: Since they are separate devices, the CPU and GPU can operate independently of each other.

Memcpy/kernel processing concurrency: For GPUs that have one or more copy engines, host↔device memcpy can be performed while the SMs are processing kernels.

Kernel concurrency: SM 2.x-class and later hardware can run up to 4 kernels in parallel.

Multi-GPU concurrency: For problems with enough computational ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required