This chapter introduces the concept of data parallelism and the essential CUDA C feature for writing a simple CUDA C program. It starts with the concept of threads, host, and device. It introduces CUDA device memory management and data transfer API functions. It further introduces the basic structure of a CUDA C kernel function, built-in variables, function declaration keywords, and kernel launch syntax.
Data parallelism; scalable parallel program; thread; kernel; API; RGB; greyscale; kernel launch; execution configuration n parameters; data transfer; error handling; stub function; SPMD