Chapter 6Streams and Concurrency

What's in this chapter?

Understanding the nature of streams and events
Exploiting grid level concurrency
Overlapping kernel execution and data transfer
Overlapping CPU and GPU execution
Understanding synchronization mechanisms
Avoiding unwanted synchronization
Adjusting stream priorities
Registering device callback functions
Displaying application execution timelines with the NVIDIA Visual Profiler

Generally speaking, there are two levels of concurrency in CUDA C programming:

Kernel level concurrency
Grid level concurrency

Up to this point, your focus has been solely on kernel level concurrency, in which a single task, or kernel, is executed in parallel by many threads on the GPU. Several ways to improve kernel performance have been covered from the programming model, execution model, and memory model points-of-view. You have developed your ability to dissect and analyze your kernel's behavior using the command-line profiler.

This chapter will examine grid level concurrency. In grid level concurrency, multiple kernel launches are executed simultaneously on a single device, often leading to better device utilization. In this chapter, you will learn how to use CUDA streams to implement grid level concurrency. ...

Get Professional CUDA C Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Professional CUDA C Programming by John Cheng, Max Grossman, Ty McKercher

Chapter 6Streams and Concurrency

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly