Consider the dependent loop shown in Listing 7.3, where s(i, j) is some statement or task to be executed.

Listing 7.3 1-D IIR digital filter algorithm

1: for i = 1:I do2: for j = 1:J do3: s(i, j) = f(s(i, j − 1))4: end for5: end for

where function evaluated by statement s(i, j) depends on s(i, j − 1). One way to distribute the tasks among the processors is to implement each iteration of the outer loop among I processors so that processor i implements all the iterations of the inner loop with its dependencies. We could increase the workload for each processor by allocating more than one iteration of the outer loop for each processor using a similar technique to the one explained in Section 7.3.

Get Algorithms and Parallel Computing now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.