November 2016
Intermediate to advanced
576 pages
18h 22m
English
Mark Ebersole
This chapter introduces key concepts of the data parallel execution model in CUDA. It first gives an overview of the multidimensional organization of CUDA threads, blocks, and grids. It then elaborates on the use of thread indexes and block indexes to map threads to different parts of the data, which is illustrated with a 2D image blur example. It then introduces barrier synchronization as a mechanism to coordinate the execution of threads within a block. This is followed by an introduction to the concept of resource queries. The chapter ends with an introduction to the concept of transparent scaling, thread scheduling, and latency tolerance.
Execution configuration parameters; ...