Chapter 17. Profiling Parallel Programs

Since the raison d'être for a cluster is higher performance, it stands to reason that if you really need a cluster, writing efficient code should be important to you. The key to improving the efficiency of your code is knowing where your code spends its time. Thus, the astute cluster user will want to master code profiling. This chapter provides an introduction to profiling in general, to the problems you’ll face with parallel programs, and to some of the tools you can use.

We’ll begin by looking briefly at issues that impact program efficiency. Next, we’ll look at ways you can time programs (and parts of programs) using readily available tools and the special features of MPI. Finally, we’ll look at the MPE library, a library that extends MPI and is particularly useful for profiling program performance. Where appropriate, we’ll look first at techniques typically used with serial programs to put the techniques in context, and then at extending them to parallel programs.

Why Profile?

You have probably heard it before—the typical program will spend over 90% of its execution time in less that 10% of the actual code. This is just a rule of thumb or heuristic, and as such, will be wildly inaccurate or totally irrelevant for some programs. But for many, if not most, programs, it is a reasonable observation. The actual numbers don’t matter since they will change from program to program. It is the idea that is important—for most programs, most of the ...

Get High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.