Chapter 3CUDA Execution Model

What's in this chapter?

  • Developing kernels with a profile-driven approach
  • Understanding the nature of warp execution
  • Exposing more parallelism to the GPU
  • Mastering grid and block configuration heuristics
  • Learning various CUDA performance metrics and events
  • Probing dynamic parallelism and nested execution

Through the exercises in the last chapter, you learned how to organize threads into grids and blocks to deliver the best performance. While you can find the best execution configuration through trial-and-error, you might be left wondering why the selected execution configuration outperforms others. You might want to know if there are some guidelines for selecting grid and block configurations. This chapter will answer those questions and provide you with deeper insight into kernel launch configurations and performance profile information, but from a different angle: the hardware perspective.

Introducing the CUDA Execution Model

In general, an execution model provides an operational view of how instructions are executed on a specific computing architecture. The CUDA execution model exposes an abstract view of the GPU parallel architecture, allowing you to reason about thread concurrency. In Chapter 2, you learned ...

Get Professional CUDA C Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.