August 2007
Intermediate to advanced
1008 pages
21h 13m
English
Mark Harris NVIDIA Corporation
Shubhabrata Sengupta University of California, Davis
John D. Owens University of California, Davis
A simple and common parallel algorithm building block is the all-prefix-sums operation. In this chapter, we define and illustrate the operation, and we discuss in detail its efficient implementation using NVIDIA CUDA. Blelloch (1990) describes all-prefix-sums as a good example of a computation that seems inherently sequential, but for which there is an efficient parallel algorithm. He defines the all-prefix-sums operation as follows:
The all-prefix-sums operation takes a binary associative operator ⊕ with identity I, and an array of n elements
[a0, a1,..., an