In this chapter we provide a detailed breakdown of the main areas that limit performance in CUDA. Each section contains small examples to illustrate the issues. They should be read in order. The previous chapters introduced you to CUDA and programming GPUs. The sections here assume you have read the previous chapters and are comfortable with the concepts introduced there, or are already familiar with CUDA and are specifically interested in techniques for improving execution speed of your programs.
This chapter is broken up into a number of strategies:
Strategy 1: Understanding the problem and breaking it down correctly into serial and parallel workloads.
Strategy 2: Understanding and optimizing for memory ...