Scalable parallel execution
Mark Ebersole
Abstract
This chapter introduces key concepts of the data parallel execution model in CUDA. It first gives an overview of the multidimensional organization of CUDA threads, blocks, and grids. It then elaborates on the use of thread indexes and block indexes to map threads to different parts of the data, which is illustrated with a 2D image blur example. It then introduces barrier synchronization as a mechanism to coordinate the execution of threads within a block. This is followed by an introduction to the concept of resource queries. The chapter ends with an introduction to the concept of transparent scaling, thread scheduling, and latency tolerance.
Keywords
Execution configuration parameters; ...
Get Programming Massively Parallel Processors, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.