parallel_reduce
Applying a function such as sum, max, min, or logical AND across all the members of a group is called a reduction operation. Doing a reduction in parallel can yield a different answer from a serial reduction because of rounding. For instance, A+B+C+D+E+F may be evaluated in serial as (((((A+B)+C)+D)+E)+F), whereas the parallel version may compute ((A+B)+((C+D)+(E+F))). Ideally, the results would be the same, but if rounding can occur, the answers will differ. Traditional C++ programs perform reductions in loops, as in the summation shown in Example 3-9.
Example 3-9. Original reduction code
float SerialSumFoo( float a[], size_t n ) {
float sum = 0;
for( size_t i=0; i!=n; ++i )
sum += Foo(a[i]);
return sum;
}If the iterations are independent, you can parallelize this loop using the template class parallel_reduce, as shown in Example 3-10.
Example 3-10. A class for use by a parallel_reduce
class SumFoo {
float* my_a;
public:
float sum;
void operator()( const blocked_range<size_t>& r ) {
float *a = my_a;
for( size_t i=r.begin(); i!=r.end(); ++i )
sum += Foo(a[i]);
}
SumFoo( SumFoo& x, split ) : my_a(x.my_a), sum(0) {}
void join( const SumFoo& y ) {sum+=y.sum;}
SumFoo(float a[] ) :
my_a(a), sum(0)
{}
};Threading Building Blocks defines parallel_reduce similar to parallel_for. The principle difference is that thread-private copies of the body must be merged at the end, and therefore the operator() is not const. Note the differences with class ApplyFoo from Example 3-4. The ...