Chapter 6 Performance Considerations

Chapter Outline

6.1 Warps and Thread Execution

6.2 Global Memory Bandwidth

6.3 Dynamic Partitioning of Execution Resources

6.4 Instruction Mix and Thread Granularity

6.5 Summary

6.6 Exercises

The execution speed of a CUDA kernel can vary greatly depending on the resource constraints of the device being used. In this chapter, we will discuss the major types of resource constraints in a CUDA device and how they can affect the kernel execution performance in this device. To achieve his or her goals, a programmer often has to find ways to achieve a required level of performance that is higher than that of an initial version of the application. In different applications, different constraints may dominate ...

Get Programming Massively Parallel Processors, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Programming Massively Parallel Processors, 2nd Edition by David B. Kirk, Wen-mei W. Hwu

Chapter 6

Performance Considerations

Chapter Outline

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly