Grain size

The third argument, grainsize, specifies the number of iterations for a reasonable size chunk to deal out to a processor. If the iteration space has more than grainsize iterations, parallel_for splits it into separate subranges that are scheduled separately.

The grainsize amortizes parallel scheduling overhead. Having a grainsize independent of the number of processors tends to keep, in common cases, the parallel scheduling overhead in constant proportion to real work. This is because the packaging-and-handling overhead is relatively constant per grain and therefore independent of the number of processors.

The grainsize enables you to avoid excessive parallel overhead. A parallel loop construct incurs overhead cost for every subrange. If the subranges are too small, the overhead may exceed the useful work. By specifying a grain size, you can limit the overhead. The grainsize effectively sets a minimum threshold for parallelization.

Figure 3-1 illustrates the impact of overhead by showing the useful work as lettered squares surrounded by the overhead of a grain of work (the darker surrounding areas). On the left, the problem is broken into four pieces (4X), and on the right, with a finer grain size, the problem is broken into 36 pieces (36X).

Packaging versus grain size, same workload

Figure 3-1. Packaging versus grain size, same workload

The total work to be done on the system is represented by the light and dark gray ...

Get Intel Threading Building Blocks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.