parallel_reduce
Applying a function such as sum, max, min
, or logical AND
across all the members of a group is called a reduction operation. Doing a reduction in parallel can yield a different answer from a serial reduction because of rounding. For instance, A+B+C+D+E+F
may be evaluated in serial as (((((A+B)+C)+D)+E)+F
), whereas the parallel version may compute ((A+B)+((C+D)+(E+F
))). Ideally, the results would be the same, but if rounding can occur, the answers will differ. Traditional C++ programs perform reductions in loops, as in the summation shown in Example 3-9.
Example 3-9. Original reduction code
float SerialSumFoo( float a[], size_t n ) { float sum = 0; for( size_t i=0; i!=n; ++i ) sum += Foo(a[i]); return sum; }
If the iterations are independent, you can parallelize this loop using the template class parallel_reduce
, as shown in Example 3-10.
Example 3-10. A class for use by a parallel_reduce
class SumFoo { float* my_a; public: float sum; void operator()( const blocked_range<size_t>& r ) { float *a = my_a; for( size_t i=r.begin(); i!=r.end(); ++i ) sum += Foo(a[i]); } SumFoo( SumFoo& x, split ) : my_a(x.my_a), sum(0) {} void join( const SumFoo& y ) {sum+=y.sum;} SumFoo(float a[] ) : my_a(a), sum(0) {} };
Threading Building Blocks defines parallel_reduce
similar to parallel_for
. The principle difference is that thread-private copies of the body must be merged at the end, and therefore the operator
() is not const
. Note the differences with class ApplyFoo
from Example 3-4. The ...
Get Intel Threading Building Blocks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.