ParallelMerge
The example in this section (Example 11-8) is more complex and requires a little familiarity with the Standard Template Library (STL) to fully understand. It shows the power of parallel_for beyond flat iteration spaces. The code performs a parallel merge of two sorted sequences. It works for any sequence with a random-access iterator. The algorithm operates recursively as follows:
If the sequences are too short for effective use of parallelism, it does a sequential merge. Otherwise, it performs steps 2–6.
It swaps the sequences if necessary so that the first sequence,
[begin1, end1), is at least as long as the second sequence,[begin2, end2).It sets
m1to the middle position in[begin1, end1). It calls the item at that locationkey.It sets
m2to wherekeywould fall in[begin2, end2).It merges
[begin1,m1)and[begin2,m2)to create the first part of the merged sequence.It merges
[m1,end1)and[m2,end2)to create the second part of the merged sequence.
The Intel Threading Building Blocks implementation of this algorithm uses the Range object to perform most of the steps. The predicate is_divisible performs the test in step 1, along with step 2. The splitting constructor performs steps 3–6. The body object does the sequential merges.
Example 11-8. Parallel merge
#include "tbb/parallel_for.h" #include <algorithm> using namespace tbb; template<typename Iterator> struct ParallelMergeRange { static size_t grainsize; Iterator begin1, end1; // [begin1,end1) is 1st sequence to be ...