Chapter 39. Parallel Prefix Sum (Scan) with CUDA

Mark Harris NVIDIA Corporation

Shubhabrata Sengupta University of California, Davis

John D. Owens University of California, Davis

Introduction

A simple and common parallel algorithm building block is the all-prefix-sums operation. In this chapter, we define and illustrate the operation, and we discuss in detail its efficient implementation using NVIDIA CUDA. Blelloch (1990) describes all-prefix-sums as a good example of a computation that seems inherently sequential, but for which there is an efficient parallel algorithm. He defines the all-prefix-sums operation as follows:

The all-prefix-sums operation takes a binary associative operator ⊕ with identity I, and an array of n elements
[a₀, a₁,..., a_n

Get GPU Gems 3 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

GPU Gems 3 by Hubert Nguyen

Chapter 39. Parallel Prefix Sum (Scan) with CUDA

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly