Chapter 4
Data-Parallel Execution Model
Chapter Outline
4.1 Cuda Thread Organization
4.2 Mapping Threads to Multidimensional Data
4.3 Matrix-Matrix Multiplication—A More Complex Kernel
4.4 Synchronization and Transparent Scalability
4.5 Assigning Resources to Blocks
4.6 Querying Device Properties
4.7 Thread Scheduling and Latency Tolerance
4.8 Summary
4.9 Exercises
Fine-grained, data-parallel threads are the fundamental means of parallel execution in CUDA. As we explained in Chapter 3, launching a CUDA kernel creates a grid of threads that all execute the kernel function. That is, the kernel function specifies the C statements that are executed by each individual thread at runtime. Each thread uses a unique coordinate, or thread index, to identify ...
Get Programming Massively Parallel Processors, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.