parallel_scan
A parallel_scan computes a parallel prefix, also known as a parallel scan. This computation is an advanced concept in parallel computing that is sometimes useful in scenarios that appear to have inherently serial dependencies.
A mathematical definition of the parallel prefix is as follows. Let ⊕ be an associative operation ⊕ with left-identity element id⊕. The parallel prefix of ⊕ over a sequence x0, x1 , … xn − 1 is a sequence y0, y1, y2 , …yn − 1 where:
y0 = id ⊕ ⊕ x0
yi = yi − 1 ⊕ i
For example, if ⊕ is addition, the parallel prefix corresponds to a running sum and the identity element is 0. A serial implementation of parallel prefix is:
T temp = id⊕;
for( int i=1; i<=n; ++i ) {
temp = temp ⊕ x[i];
y[i] = temp;
}Parallel prefix performs this in parallel by reassociating the application of ⊕ and using two passes. It may invoke ⊕ up to twice as many times as the serial prefix algorithm. But given the right grain size and sufficient hardware threads, it can out-perform the serial prefix because—even though it does more work—it can distribute the work across more than one hardware thread.
Tip
Because parallel_scan needs two passes, systems with only two hardware threads tend to exhibit only a small speedup. parallel_scan is better suited for future systems with more than two cores. It shows how a problem that appears inherently sequential can be parallelized.
Example 3-20 demonstrates how to use parallel_scan in a way similar to the sequential example.
Example 3-20. parallel_scan ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access