Chapter 4 Data-Parallel Execution Model

Chapter Outline

4.1 Cuda Thread Organization

4.2 Mapping Threads to Multidimensional Data

4.3 Matrix-Matrix Multiplication—A More Complex Kernel

4.4 Synchronization and Transparent Scalability

4.5 Assigning Resources to Blocks

4.6 Querying Device Properties

4.7 Thread Scheduling and Latency Tolerance

4.8 Summary

4.9 Exercises

Fine-grained, data-parallel threads are the fundamental means of parallel execution in CUDA. As we explained in Chapter 3, launching a CUDA kernel creates a grid of threads that all execute the kernel function. That is, the kernel function specifies the C statements that are executed by each individual thread at runtime. Each thread uses a unique coordinate, or thread index, to identify ...

Get Programming Massively Parallel Processors, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Programming Massively Parallel Processors, 2nd Edition by David B. Kirk, Wen-mei W. Hwu

Chapter 4

Data-Parallel Execution Model

Chapter Outline

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly