Chapter 39. Parallel Prefix Sum (Scan) with CUDA
Mark Harris NVIDIA Corporation
Shubhabrata Sengupta University of California, Davis
John D. Owens University of California, Davis
Introduction
A simple and common parallel algorithm building block is the all-prefix-sums operation. In this chapter, we define and illustrate the operation, and we discuss in detail its efficient implementation using NVIDIA CUDA. Blelloch (1990) describes all-prefix-sums as a good example of a computation that seems inherently sequential, but for which there is an efficient parallel algorithm. He defines the all-prefix-sums operation as follows:
The all-prefix-sums operation takes a binary associative operator ⊕ with identity I, and an array of n elements
[a0, a1,..., an
Get GPU Gems 3 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.