parallel_scan
A parallel_scan computes a parallel prefix, also known as a parallel scan. This computation is an advanced concept in parallel computing that is sometimes useful in scenarios that appear to have inherently serial dependencies.
A mathematical definition of the parallel prefix is as follows. Let ⊕ be an associative operation ⊕ with left-identity element id⊕. The parallel prefix of ⊕ over a sequence x0, x1 , … xn − 1 is a sequence y0, y1, y2 , …yn − 1 where:
y0 = id ⊕ ⊕ x0
yi = yi − 1 ⊕ i
For example, if ⊕ is addition, the parallel prefix corresponds to a running sum and the identity element is 0. A serial implementation of parallel prefix is:
T temp = id⊕;
for( int i=1; i<=n; ++i ) {
temp = temp ⊕ x[i];
y[i] = temp;
}Parallel prefix performs this in parallel by reassociating the application of ⊕ and using two passes. It may invoke ⊕ up to twice as many times as the serial prefix algorithm. But given the right grain size and sufficient hardware threads, it can out-perform the serial prefix because—even though it does more work—it can distribute the work across more than one hardware thread.
Tip
Because parallel_scan needs two passes, systems with only two hardware threads tend to exhibit only a small speedup. parallel_scan is better suited for future systems with more than two cores. It shows how a problem that appears inherently sequential can be parallelized.
Example 3-20 demonstrates how to use parallel_scan in a way similar to the sequential example.
Example 3-20. parallel_scan ...