What's in this chapter?
- Writing a CUDA program
- Executing a kernel function
- Organizing threads with grids and blocks
- Measuring GPU performance
CUDA is a parallel computing platform and programming model with a small set of extensions to the C language. With CUDA, you can implement a parallel algorithm as easily as you write C programs. You can build applications for a myriad of systems with CUDA on NVIDIA GPUs, ranging from embedded devices, tablet devices, laptops, desktops, and workstations to HPC clustered systems. Familiar C programming software tools have been extended to help you edit, debug, and analyze your CUDA program during the lifetime of your project. In this chapter, you are going to learn how to write a CUDA program through two simple examples: vector addition and matrix addition.
Introducing the CUDA Programming Model
Programming models present an abstraction of computer architectures that act as a bridge between an application and its implementation on available hardware. Figure 2.1 illustrates the important layers of abstraction that lie between the program and the programming model implementation. The communication abstraction is the boundary between the program and the programming model implementation, ...