Scientists demand
the most powerful computing machinery available, whether they model
global atmospheric conditions, decipher the human genome, study
viscous fluid dynamics, visualize plasmas in 3 dimensions, or vibrate
strings in 14. Besides tons of disk space (perhaps a terabyte for
*/tmp* alone) and tremendous I/O and memory
bandwidth, they want raw CPU power. Since no single CPU will ever
suffice, scientists work with vector and massively parallel
computers.

In parallel programming, we try to
divide the work of a sequential program into portions so that many
processors can work on the problem simultaneously. For example, this
code^{[2]} computes π by integrating the
function 4 / (1 + x^{2}) in the range
**-**.5 ≤ x < .5:

my $intervals = 1_000; my $h = 1.0 / $intervals; my $sum = 0.0; for(my $i = 1; $i <= $intervals; $i++) { my $x = $h * ($i - 0.5); $sum += 4.0 / (1.0 + $x*$x); } my $pi = $h * $sum;

The variable `$intervals`

defines the granularity of the summation and, hence, the accuracy of the computation. To get a good result, the interval must be very finely divided, which, of course, increases the program’s running time. But suppose we parcel out the work to two processors, with each one integrating only half of the curve? If we then add the two partial sums, the value of π is the same, but the computational wall-clock time is halved. In fact, putting 10 processors to work completes the job an order of magnitude faster, because this problem ...

Start Free Trial

No credit card required